Unity Vision

One of my favorite challenges in sci-fi is showing how alien an AI mind is. (It’s part of what makes Ex Machina so compelling, and the end of Her, and why Data from Star Trek: The Next Generation always read like a dopey, Pinnochio-esque narrative tool. But a full comparison is for another post.) Given that screen sci-fi is a medium of light, sound, and language, I really enjoy when filmmakers try to show how they see, hear, and process this information differently.

In Colossus: The Forbin Project, when Unity begins issuing demands, one of its first instructions is to outfit the Computer Programming Office (CPO) with wall-mounted video cameras that it can access and control. Once this network of cameras is installed, Forbin gives Unity a tour of the space, introducing it visually and spatially to a place it has only known as an abstract node network. During this tour, the audience is also introduced to Unity’s point-of-view, which includes an overlay consisting of several parts.

The first part is a white overlay of rule lines and MICR characters that cluster around the edge of the frame. These graphics do not change throughout the film, whether Unity is looking at Forbin in the CPO, carefully watching for signs of betrayal in a missile silo, or creepily keeping an “eye” on Forbin and Markham’s date for signs of deception.

In these last two screen grabs, you see the second part of the Unity POV, which is a focus indicator. This overlay appears behind the white bits; it’s a blue translucent overlay with a circular hole revealing true color. The hole shows where Unity is focusing. This indicator appears, occasionally, and can change size and position. It operates independently of the optical zoom of the camera, as we see in the below shots of Forbin’s tour.

A first augmented computer PoV? 🥇

When writing about computer PoVs before, I have cited Westworld as the first augmented one, since we see things from The Gunslinger’s infrared-vision eyes in the persistence-hunting sequences. (2001: A Space Odyssey came out the year prior to Colossus, but its computer PoV shots are not augmented.) And Westworld came out three years after Colossus, so until it is unseated, I’m going to regard this as the first augmented computer PoV in cinema. (Even the usually-encyclopedic TVtropes doesn’t list this one at the time of publishing.) It probably blew audiences’ minds as it was.

“Colossus, I am Forbin.”

And as such, we should cut it a little slack for not meeting our more literate modern standards. It was forging new territory. Even for that, it’s still pretty bad.

Real world computer vision

Though computer vision is always advancing, it’s safe to say that AI would be looking at the flat images and seeking to understand the salient bits per its goals. In the case of self-driving cars, that means finding the road, reading signs and road makers, identifying objects and plotting their trajectories in relation to the vehicle’s own trajectory in order to avoid collisions, and wayfinding to the destination, all compared against known models of signs, conveyances, laws, maps, and databases. Any of these are good fodder for sci-fi visualization.

Source: Medium article about the state of computer vision in Russia, 2017.

Unity’s concerns would be its goal of ending war, derived subgoals and plans to achieve those goals, constant scenario testing, how it is regarded by humans, identification of individuals, and the trustworthiness of those humans. There are plenty of things that could be augmented, but that would require more than we see here.

Unity Vision looks nothing like this

I don’t consider it worth detailing the specific characters in the white overlay, or backworlding some meaning in the rule lines, because the rule overlay does not change over the course of the movie. In the book Make It So: Interaction Design Lessons from Sci-fi, Chapter 8, Augmented Reality, I identified the types of awareness such overlays could show: sensor output, location awareness, context awareness, and goal awareness, but each of these requires change over time to be useful, so this static overlay seems not just pointless, but it risks covering up important details that the AI might need.

Compare the computer vision of The Terminator.

Many times you can excuse computer-PoV shots as technical legacy, that is, a debugging tool that developers built for themselves while developing the AI, and which the AI now uses for itself. In this case, it’s heavily implied that Unity provided the specifications for this system itself, so that doesn’t make sense.

The focus indicator does change over time, but it indicates focus in a way that, again, obscures other information in the visual feed and so is not in Unity’s interest. Color spaces are part of the way computers understand what they’re seeing, and there is no reason it should make it harder on itself, even if it is a super AI.

Largely extradiegetic

So, since a diegetic reading comes up empty, we have to look at this extradiegetically. That means as a tool for the audience to understand when they’re seeing through Unity’s eyes—rather than the movie’s—and via the focus indicator, what the AI is inspecting.

As such, it was probably pretty successful in the 1970s to instantly indicate computer-ness.

One reason is the typeface. The characters are derived from MICR, which stands for magnetic ink character recognition. It was established in the 1950s as a way to computerize check processing. Notably, the original font had only numerals and four control characters, no alphabetic ones.

Note also that these characters bear a style resemblance to the ones seen in the film but are not the same. Compare the 0 character here with the one in the screenshots, where that character gets a blob in the lower right stroke.

I want to give a shout-out to the film makers for not having this creeper scene focus on lascivious details, like butts or breasts. It’s a machine looking for signs of deception, and things like hands, microexpressions, and, so the song goes, kisses are more telling.

Still, MICR was a genuinely high-tech typeface of the time. The adult members of the audience would certainly have encountered the “weird” font in their personal lives while looking at checks, and likely understood its purpose, so was a good choice for 1970, even if the details were off.

Another is the inscrutability of the lines. Why are they there, in just that way? Their inscrutability is the point. Most people in audiences regard technology and computers as having arcane reasons for the way they are, and these rectilinear lines with odd greebles and nurnies invoke that same sensibility. All the whirring gizmos and bouncing bar charts of modern sci-fi interfaces exhibit the same kind of FUIgetry.

So for these reasons, while it had little to do with the substance of computer vision, its heart was in the right place to invoke computer-y-ness.

Dat Ending

At the very end of the film, though, after Unity asserts that in time humans will come to love it, Forbin staunchly says, “Never.” Then the film passes into a sequence that is hard to tell whether it’s meant to be diegetic or not.

In the first beat, the screen breaks into four different camera angles of Forbin at once. (The overlay is still there, as if this was from a single camera.)

This says more about computer vision than even the FUIgetry.

This sense of multiples continues in the second beat, as multiple shots repeat in a grid. The grid is clipped to a big circle that shrinks to a point and ends the film in a moment of blackness before credits roll.

Since it happens right before the credits, and it has no precedent in the film, I read it as not part of the movie, but a title sequence. And that sucks. I wish wish wish this had been the standard Unity-view from the start. It illustrates that Unity is not gathering its information from a single stereoscopic image, like humans and most vertebrates do, but from multiple feeds simultaneously. That’s alien. Not even insectoid, but part of how this AI senses the world.

Klaatunian interior

DtESS-034

When the camera first follows Klaatu into the interior of his spaceship, we witness the first gestural interface seen in the survey. To turn on the lights, Klaatu places his hands in the air before a double column of small lights imbedded in the wall to the right of the door. He holds his hand up for a moment, and then smoothly brings it down before these lights. In response the lights on the wall extinguish and an overhead light illuminates. He repeats this gesture on a similar double column of lights to the left of the door.

The nice thing to note about this gesture is that it is simple and easy to execute. The mapping also has a nice physical referent: When the hand goes down like the sun, the lights dim. When the hand goes up like the sun, the lights illuminate.

He then approaches an instrument panel with an array of translucent controls; like a small keyboard with extended, plastic keys. As before, he holds his hand a moment at the top of the controls before swiping his hand in the air toward the bottom of the controls. In response, the panels illuminate. He repeats this on a similar panel nearby.

Having activated all of these elements, he begins to speak in his alien tongue to a circular, strangely lit panel on the wall. (The film gives no indication as to the purpose of his speech, so no conclusions about its interface can be drawn.)

DtESS-049

Gort also operates the translucent panels with a wave of his hand. To her credit, perhaps, Helen does not try to control the panels, but we can presume that, like the spaceship, some security mechanism prevents unauthorized control.

Missing affordances

Who knows how Klaatu perceives this panel. He’s an alien, after all. But for us mere humans, the interface is confounding. There are no labels to help us understand what controls what. The physical affordances of different parts of the panels imply sliding along the surface, touch, or turning, not gesture. Gestural affordances are tricky at best, but these translucent shapes actually signal something different altogether.

Overcomplicated workflow

And you have to wonder why he has to go through this rigmarole at all. Why must he turn on each section of the interface, one by one? Can’t they make just one “on” button? And isn’t he just doing one thing: Transmitting? He doesn’t even seem to select a recipient, so it’s tied to HQ. Seriously, can’t he just turn it on?

Why is this UI even here?

Or better yet, can’t the microphone just detect when he’s nearby, illuminate to let him know it’s ready, and subtly confirm when it’s “hearing” him? That would be the agentive solution.

Maybe it needs some lockdown: Power

OK. Fine. If this transmission consumes a significant amount of power, then an even more deliberate activation is warranted, perhaps the turning of a key. And once on, you would expect to see some indication of the rate of power depletion and remaining power reserves, which we don’t see, so this is pretty doubtful.

Maybe it needs some lockdown: Security

This is the one concern that might warrant all the craziness. That the interface has no affordance means that Joe Human Schmo can’t just walk in and turn it on. (In fact the misleading bits help with a plausible diversion.) The “workflow” then is actually a gestural combination that unlocks the interface and starts it recording. Even if Helen accidentally discovered the gestural aspect, there’s little to no way she could figure out those particular gestures and start intergalactic calls for help. And remembering that Klaatu is, essentially, a space ethics reconn cop, this level of security might make sense.