Fritzes 2026 Best Believable

The Fritzes award honors the best interfaces in a full-length motion picture in the past year. Interfaces play a special role in our movie-going experience, and are a craft all their own that does not otherwise receive focused recognition.

Today we’ll be covering Best Believable. These movies’ interfaces adhere to solid computer-human-interaction principles and believable interactions. They engage us in the story world by being convincing.

The 2026 Award goes to: The Running Man

This second adaptation of Stephen King’s novel knocks it out of the park for the plot-central interfaces: The runner cuff and R-Cam box, the hideous sousveillance phone app for “fans”, the service design of the “free-v” show, and the in-home snitch interfaces. They lean towards narrative (missing a few things real-world counterparts would need), but all help articulate this dystopian world and the circumstances that drive the action. Moreover, I feel quite certain not making good real-world models of these horrible things is the right thing to do, especially given *gestures vaguely at the kakistocracy*.

On top of that it also has lots of awesome everyday interfaces, and it takes a level of commitment on the part of the filmmakers to go that deep in the worldbuilding. There’s a videophone interface with shades of Blade Runner. There’s a mailbox that signals its readiness and lifts off immediately after receiving a letter. (Though I would have flipped those red and green colors, so red meant “don’t put mail in here” and green meant “ready to receive”, but my invitation was lost in the mail.) The fare interfaces in the taxi. The self-driving interface of the citizen car. The piloting interfaces aboard the network plane. It’s all uncluttered, straightforward, and believable. Really well done, really well presented, and that’s hard to do in intense-action movies.

Also check out: War of the Worlds (2025) 

It got universally panned. Fair enough, neither ubiquitous government surveillance nor the current DHS bears valorization. (Also the virus-but-its-digital twist was already done), but I am impressed that this take on the classic Wells story is told almost entirely through interfaces, and each of them is detailed and mostly-realistic. The editing around the interface can be dizzying, and I wondered why William Radford had to do so much digital hunting at the beginning when an assistant should have been guiding his attention. But it’s impressive to bring that tale to life mostly through this unsung medium.

Also check out: Companion

With soft echoes of the interfaces in Westworld (2016), the interfaces in Companion control android and gynoid companions. (Yes, that term is deliberately coy.) They are clean and simple, which underscores the robots’ horror that they are under that much control by their owners.

My hackles are raised from “Intelligence” being a single slider. Intelligence is much more complicated than that, and this notion that it’s a single scalar variable has done a lot of damage over time. Even if they’d had a little expando control, it would have pointed at the idea that we’re looking at a simplification. Also I wish they’d provided a live preview of the eye color, because even with its intended use—of an owner controlling their companion’s eye color—this control has them glancing up to see the effect and then back down again to adjust, which is not a satisfying feedback loop. I use this very control as an example of a “plan” assistant in my new book. Hey, all of Hollywood: Buy it!

Next up: The Best Narrative interfaces from 2025

Deckard’s Photo Inspector

Back to Blade Runner. I mean, the pandemic is still pandemicking, but maybe this will be a nice distraction while you shelter in place. Because you’re smart, sheltering in place as much as you can, and not injecting disinfectants. And, like so many other technologies in this film, this will take a while to deconstruct, critique, and reimagine.

Description

Doing his detective work, Deckard retrieves a set of snapshots from Leon’s hotel room, and he brings them home. Something in the one pictured above catches his eye, and he wants to investigate it in greater detail. He takes the photograph and inserts it in a black device he keeps in his living room.

Note: I’ll try and describe this interaction in text, but it is much easier to conceptualize after viewing it. Owing to copyright restrictions, I cannot upload this length of video with the original audio, so I have added pre-rendered closed captions to it, below. All dialogue in the clip is Deckard.

Deckard does digital forensics, looking for a lead.

He inserts the snapshot into a horizontal slit and turns the machine on. A thin, horizontal orange line glows on the left side of the front panel. A series of seemingly random-length orange lines begin to chase one another in a single-row space that stretches across the remainder of the panel and continue to do so throughout Deckard’s use of it. (Imagine a news ticker, running backwards, where the “headlines” are glowing amber lines.) This seems useless and an absolutely pointless distraction for Deckard, putting high-contrast motion in his peripheral vision, which fights for attention with the actual, interesting content down below.

If this is distracting you from reading, YOU SEE MY POINT.

After a second, the screen reveals a blue grid, behind which the scan of the snapshot appears. He stares at the image in the grid for a moment, and speaks a set of instructions, “Enhance 224 to 176.”

In response, three data points appear overlaying the image at the bottom of the screen. Each has a two-letter label and a four-digit number, e.g. “ZM 0000 NS 0000 EW 0000.” The NS and EW—presumably North-South and East-West coordinates, respectively—immediately update to read, “ZM 0000 NS 0197 EW 0334.” After updating the numbers, the screen displays a crosshairs, which target a single rectangle in the grid.

A new rectangle then zooms in from the edges to match the targeted rectangle, as the ZM number—presumably zoom, or magnification—increases. When the animated rectangle reaches the targeted rectangle, its outline blinks yellow a few times. Then the contents of the rectangle are enlarged to fill the screen, in a series of steps which are punctuated with sounds similar to a mechanical camera aperture. The enlargement is perfectly resolved. The overlay disappears until the next set of spoken commands. The system response between Deckard’s issuing the command and the device’s showing the final enlarged image is about 11 seconds.

Deckard studies the new image for awhile before issuing another command. This time he says, “Enhance.” The image enlarges in similar clacking steps until he tells it, “Stop.”

Other instructions he is heard to give include “move in, pull out, track right, center in, pull back, center, and pan right.” Some include discrete instructions, such as, “Track 45 right” while others are relative commands that the system obeys until told to stop, such as “Go right.”

Using such commands he isolates part of the image that reveals an important clue, and he speaks the instruction, “Give me a hard copy right there.” The machine prints the image, which Deckard uses to help find the replicant pictured.

This image helps lead him to Zhora.

I’d like to point out one bit of sophistication before the critique. Deckard can issue a command with or without a parameter, and the inspector knows what to do. For example, “Track 45 right” and “Track right.” Without the parameter, it will just do the thing repeatedly until told to stop. That helps Deckard issue the same basic command when he knows exactly where he wants to look and when doesn’t know what exactly what he’s looking for. That’s a nice feature of the language design.

But still, asking him to provide step-by-step instructions in this clunky way feels like some high-tech Big Trak. (I tried to find a reference that was as old as the film.) And that’s not all…

Some critiques, as it is

  • Can I go back and mention that amber distracto-light? Because it’s distracting. And pointless. I’m not mad. I’m just disappointed.
  • It sure would be nice if any of the numbers on screen made sense, and had any bearing with the numbers Deckard speaks, at any time during the interaction. For instance, the initial zoom (I checked in Photoshop) is around 304%, which is neither the 224 or 176 that Deckard speaks.
  • It might be that each square has a number, and he simply has to name the two squares at the extents of the zoom he wants, letting the machine find the extents, but where is the labeling? Did he have to memorize an address for each pixel? How does that work at arbitrary levels of zoom?
  • And if he’s memorized it, why show the overlay at all?
  • Why the seizure-inducing flashing in the transition sequences? Sure, I get that lots of technologies have unfortunate effects when constrained by mechanics, but this is digital.
  • Why is the printed picture so unlike the still image where he asks for a hard copy?
  • Gaze at the reflection in Ford’s hazel, hazel eyes, and it’s clear he’s playing Missile Command, rather than paying attention to this interface at all. (OK, that’s the filmmaker’s issue, not a part of the interface, but still, come on.)
The photo inspector: My interface is up HERE, Rick.

How might it be improved for 1982?

So if 1982 Ridley Scott was telling me in post that we couldn’t reshoot Harrison Ford, and we had to make it just work with what we had, here’s what I’d do…

Squash the grid so the cells match the 4:3 ratio of the NTSC screen. Overlay the address of each cell, while highlighting column and row identifiers at the edges. Have the first cell’s outline illuminate as he speaks it, and have the outline expand to encompass the second named cell. Then zoom, removing the cell labels during the transition. When at anything other than full view, display a map across four cells that shows the zoom visually in the context of the whole.

Rendered in glorious 4:3 NTSC dimensions.

With this interface, the structure of the existing conversation makes more sense. When Deckard said, “Enhance 203 to 608” the thing would zoom in on the mirror, and the small map would confirm.

The numbers wouldn’t match up, but it’s pretty obvious from the final cut that Scott didn’t care about that (or, more charitably, ran out of time). Anyway I would be doing this under protest, because I would argue this interaction needs to be fixed in the script.

How might it be improved for 2020?

What’s really nifty about this technology is that it’s not just a photograph. Look close in the scene, and Deckard isn’t just doing CSI Enhance! commands (or, to be less mocking, AI upscaling). He’s using the photo inspector to look around corners and at objects that are reconstructed from the smallest reflections. So we can think of the interaction like he’s controlling a drone through a 3D still life, looking for a lead to help him further the case.

With that in mind, let’s talk about the display.

Display

To redesign it, we have to decide at a foundational level how we think this works, because it will color what the display looks like. Is this all data that’s captured from some crazy 3D camera and available in the image? Or is it being inferred from details in the 2 dimensional image? Let’s call the first the 3D capture, and the second the 3D inference.

If we decide this is a 3-D capture, then all the data that he observes through the machine has the same degree of confidence. If, however, we decide this is a 3D inferrer, Deckard needs to treat the inferred data with more skepticism than the data the camera directly captured. The 3-D inferrer is the harder problem, and raises some issues that we must deal with in modern AI, so let’s just say that’s the way this speculative technology works.

The first thing the display should do it make it clear what is observed and what is inferred. How you do this is partly a matter of visual design and style, but partly a matter of diegetic logic. The first pass would be to render everything in the camera frustum photo-realistically, and then render everything outside of that in a way that signals its confidence level. The comp below illustrates one way this might be done.

Modification of a pair of images found on Evermotion
  • In the comp, Deckard has turned the “drone” from the “actual photo,” seen off to the right, toward the inferred space on the left. The monochrome color treatment provides that first high-confidence signal.
  • In the scene, the primary inference would come from reading the reflections in the disco ball overhead lamp, maybe augmented with plans for the apartment that could be found online, or maybe purchase receipts for appliances, etc. Everything it can reconstruct from the reflection and high-confidence sources has solid black lines, a second-level signal.
  • The smaller knickknacks that are out of the reflection of the disco ball, and implied from other, less reflective surfaces, are rendered without the black lines and blurred. This provides a signal that the algorithm has a very low confidence in its inference.

This is just one (not very visually interesting) way to handle it, but should illustrate that, to be believable, the photo inspector shouldn’t have a single rendering style outside the frustum. It would need something akin to these levels to help Deckard instantly recognize how much he should trust what he’s seeing.

Flat screen or volumetric projection?

Modern CGI loves big volumetric projections. (e.g. it was the central novum of last year’s Fritz winner, Spider-Man: Far From Home.) And it would be a wonderful juxtaposition to see Deckard in a holodeck-like recreation of Leon’s apartment, with all the visual treatments described above.

But…

Also seriously who wants a lamp embedded in a headrest?

…that would kind of spoil the mood of the scene. This isn’t just about Deckard’s finding a clue, we also see a little about who he is and what his life is like. We see the smoky apartment. We see the drab couch. We see the stack of old detective machines. We see the neon lights and annoying advertising lights swinging back and forth across his windows. Immersing him in a big volumetric projection would lose all this atmospheric stuff, and so I’d recommend keeping it either a small contained VP, like we saw in Minority Report, or just keep it a small flat screen.


OK, so we have an idea about how the display would (and shouldn’t) look, let’s move on to talk about the inputs.

Inputs

To talk about inputs, then, we have to return to a favorite topic of mine, and that is the level of agency we want for the interaction. In short, we need to decide how much work the machine is doing. Is the machine just a manual tool that Deckard has to manipulate to get it to do anything? Or does it actively assist him? Or, lastly, can it even do the job while his attention is on something else—that is, can it act as an agent on his behalf? Sophisticated tools can be a blend of these modes, but for now, let’s look at them individually.

Manual Tool

This is how the photo inspector works in Blade Runner. It can do things, but Deckard has to tell it exactly what to do. But we can still improve it in this mode.

We could give him well-mapped physical controls, like a remote control for this conceptual drone. Flight controls wind up being a recurring topic on this blog (and even came up already in the Blade Runner reviews with the Spinners) so I could go on about how best to do that, but I think that a handheld controller would ruin the feel of this scene, like Deckard was sitting down to play a video game rather than do off-hours detective work.

Special edition made possible by our sponsor, Tom Nook.
(I hope we can pay this loan back.)

Similarly, we could talk about a gestural interface, using some of the synecdochic techniques we’ve seen before in Ghost in the Shell. But again, this would spoil the feel of the scene, having him look more like John Anderton in front of a tiny-TV version of Minority Report’s famous crime scrubber.

One of the things that gives this scene its emotional texture is that Deckard is drinking a glass of whiskey while doing his detective homework. It shows how low he feels. Throwing one back is clearly part of his evening routine, so much a habit that he does it despite being preoccupied about Leon’s case. How can we keep him on the couch, with his hand on the lead crystal whiskey glass, and still investigating the photo? Can he use it to investigate the photo?

Here I recommend a bit of ad-hoc tangible user interface. I first backworlded this for The Star Wars Holiday Special, but I think it could work here, too. Imagine that the photo inspector has a high-resolution camera on it, and the interface allows Deckard to declare any object that he wants as a control object. After the declaration, the camera tracks the object against a surface, using the changes to that object to control the virtual camera.

In the scene, Deckard can declare the whiskey glass as his control object, and the arm of his couch as the control surface. Of course the virtual space he’s in is bigger than the couch arm, but it could work like a mouse and a mousepad. He can just pick it up and set it back down again to extend motion.

This scheme takes into account all movement except vertical lift and drop. This could be a gesture or a spoken command (see below).

Going with this interaction model means Deckard can use the whiskey glass, allowing the scene to keep its texture and feel. He can still drink and get his detective on.

Tipping the virtual drone to the right.

Assistant Tool

Indirect manipulation is helpful for when Deckard doesn’t know what he’s looking for. He can look around, and get close to things to inspect them. But when he knows what he’s looking for, he shouldn’t have to go find it. He should be able to just ask for it, and have the photo inspector show it to him. This requires that we presume some AI. And even though Blade Runner clearly includes General AI, let’s presume that that kind of AI has to be housed in a human-like replicant, and can’t be squeezed into this device. Instead, let’s just extend the capabilities of Narrow AI.

Some of this will be navigational and specific, “Zoom to that mirror in the background,” for instance, or, “Reset the orientation.” Some will more abstract and content-specific, e.g. “Head to the kitchen” or “Get close to that red thing.” If it had gaze detection, he could even indicate a location by looking at it. “Get close to that red thing there,” for example, while looking at the red thing. Given the 3D inferrer nature of this speculative device, he might also want to trace the provenance of an inference, as in, “How do we know this chair is here?” This implies natural language generation as well as understanding.

There’s nothing from stopping him using the same general commands heard in the movie, but I doubt anyone would want to use those when they have commands like this and the object-on-hand controller available.

Ideally Deckard would have some general search capabilities as well, to ask questions and test ideas. “Where were these things purchased?” or subsequently, “Is there video footage from the stores where he purchased them?” or even, “What does that look like to you?” (The correct answer would be, “Well that looks like the mirror from the Arnolfini portrait, Ridley…I mean…Rick*”) It can do pattern recognition and provide as much extra information as it has access to, just like Google Lens or IBM Watson image recognition does.

*Left: The convex mirror in Leon’s 21st century apartment.
Right: The convex mirror in Arnolfini’s 15th century apartment

Finally, he should be able to ask after simple facts to see if the inspector knows or can find it. For example, “How many people are in the scene?”

All of this still requires that Deckard initiate the action, and we can augment it further with a little agentive thinking.

Agentive Tool

To think in terms of agents is to ask, “What can the system do for the user, but not requiring the user’s attention?” (I wrote a book about it if you want to know more.) Here, the AI should be working alongside Deckard. Not just building the inferences and cataloguing observations, but doing anomaly detection on the whole scene as it goes. Some of it is going to be pointless, like “Be aware the butter knife is from IKEA, while the rest of the flatware is Christofle Lagerfeld. Something’s not right, here.” But some of it Deckard will find useful. It would probably be up to Deckard to review summaries and decide which were worth further investigation.

It should also be able to help him with his goals. For example, the police had Zhora’s picture on file. (And her portrait even rotates in the dossier we see at the beginning, so it knows what she looks like in 3D for very sophisticated pattern matching.) The moment the agent—while it was reverse ray tracing the scene and reconstructing the inferred space—detects any faces, it should run the face through a most wanted list, and specifically Deckard’s case files. It shouldn’t wait for him to find it. That again poses some challenges to the script. How do we keep Deckard the hero when the tech can and should have found Zhora seconds after being shown the image? It’s a new challenge for writers, but it’s becoming increasingly important for believability.

Though I’ve never figured out why she has a snake tattoo here (and it seems really important to the plot) but then when Deckard finally meets her, it has disappeared.

Scene

  • Interior. Deckard’s apartment. Night.
  • Deckard grabs a bottle of whiskey, a glass, and the photo from Leon’s apartment. He sits on his couch and places the photo on the coffee table.
  • Deckard
  • Photo inspector.
  • The machine on top of a cluttered end table comes to life.
  • Deckard
  • Let’s look at this.
  • He points to the photo. A thin line of light sweeps across the image. The scanned image appears on the screen, pulled in a bit from the edges. A label reads, “Extending scene,” and we see wireframe representations of the apartment outside the frame begin to take shape. A small list of anomalies begins to appear to the left. Deckard pours a few fingers of whiskey into the glass. He takes a drink before putting the glass on the arm of his couch. Small projected graphics appear on the arm facing the inspector.
  • Deckard
  • OK. Anyone hiding? Moving?
  • Photo inspector
  • No and no.
  • Deckard
  • Zoom to that arm and pin to the face.
  • He turns the glass on the couch arm counterclockwise, and the “drone” revolves around to show Leon’s face, with the shadowy parts rendered in blue.
  • Deckard
  • What’s the confidence?
  • Photo inspector
  • 95.
  • On the side of the screen the inspector overlays Leon’s police profile.
  • Deckard
  • Unpin.
  • Deckard lifts his glass to take a drink. He moves from the couch to the floor to stare more intently and places his drink on the coffee table.
  • Deckard
  • New surface.
  • He turns the glass clockwise. The camera turns and he sees into a bedroom.
  • Deckard
  • How do we have this much inference?
  • Photo inspector
  • The convex mirror in the hall…
  • Deckard
  • Wait. Is that a foot? You said no one was hiding.
  • Photo inspector
  • The individual is not hiding. They appear to be sleeping.
  • Deckard rolls his eyes.
  • Deckard
  • Zoom to the face and pin.
  • The view zooms to the face, but the camera is level with her chin, making it hard to make out the face. Deckard tips the glass forward and the camera rises up to focus on a blue, wireframed face.
  • Deckard
  • That look like Zhora to you?
  • The inspector overlays her police file.
  • Photo inspector
  • 63% of it does.
  • Deckard
  • Why didn’t you say so?
  • Photo inspector
  • My threshold is set to 66%.
  • Deckard
  • Give me a hard copy right there.
  • He raises his glass and finishes his drink.

This scene keeps the texture and tone of the original, and camps on the limitations of Narrow AI to let Deckard be the hero. And doesn’t have him programming a virtual Big Trak.

Unity Vision

One of my favorite challenges in sci-fi is showing how alien an AI mind is. (It’s part of what makes Ex Machina so compelling, and the end of Her, and why Data from Star Trek: The Next Generation always read like a dopey, Pinnochio-esque narrative tool. But a full comparison is for another post.) Given that screen sci-fi is a medium of light, sound, and language, I really enjoy when filmmakers try to show how they see, hear, and process this information differently.

In Colossus: The Forbin Project, when Unity begins issuing demands, one of its first instructions is to outfit the Computer Programming Office (CPO) with wall-mounted video cameras that it can access and control. Once this network of cameras is installed, Forbin gives Unity a tour of the space, introducing it visually and spatially to a place it has only known as an abstract node network. During this tour, the audience is also introduced to Unity’s point-of-view, which includes an overlay consisting of several parts.

The first part is a white overlay of rule lines and MICR characters that cluster around the edge of the frame. These graphics do not change throughout the film, whether Unity is looking at Forbin in the CPO, carefully watching for signs of betrayal in a missile silo, or creepily keeping an “eye” on Forbin and Markham’s date for signs of deception.

In these last two screen grabs, you see the second part of the Unity POV, which is a focus indicator. This overlay appears behind the white bits; it’s a blue translucent overlay with a circular hole revealing true color. The hole shows where Unity is focusing. This indicator appears, occasionally, and can change size and position. It operates independently of the optical zoom of the camera, as we see in the below shots of Forbin’s tour.

A first augmented computer PoV? 🥇

When writing about computer PoVs before, I have cited Westworld as the first augmented one, since we see things from The Gunslinger’s infrared-vision eyes in the persistence-hunting sequences. (2001: A Space Odyssey came out the year prior to Colossus, but its computer PoV shots are not augmented.) And Westworld came out three years after Colossus, so until it is unseated, I’m going to regard this as the first augmented computer PoV in cinema. (Even the usually-encyclopedic TVtropes doesn’t list this one at the time of publishing.) It probably blew audiences’ minds as it was.

“Colossus, I am Forbin.”

And as such, we should cut it a little slack for not meeting our more literate modern standards. It was forging new territory. Even for that, it’s still pretty bad.

Real world computer vision

Though computer vision is always advancing, it’s safe to say that AI would be looking at the flat images and seeking to understand the salient bits per its goals. In the case of self-driving cars, that means finding the road, reading signs and road makers, identifying objects and plotting their trajectories in relation to the vehicle’s own trajectory in order to avoid collisions, and wayfinding to the destination, all compared against known models of signs, conveyances, laws, maps, and databases. Any of these are good fodder for sci-fi visualization.

Source: Medium article about the state of computer vision in Russia, 2017.

Unity’s concerns would be its goal of ending war, derived subgoals and plans to achieve those goals, constant scenario testing, how it is regarded by humans, identification of individuals, and the trustworthiness of those humans. There are plenty of things that could be augmented, but that would require more than we see here.

Unity Vision looks nothing like this

I don’t consider it worth detailing the specific characters in the white overlay, or backworlding some meaning in the rule lines, because the rule overlay does not change over the course of the movie. In the book Make It So: Interaction Design Lessons from Sci-fi, Chapter 8, Augmented Reality, I identified the types of awareness such overlays could show: sensor output, location awareness, context awareness, and goal awareness, but each of these requires change over time to be useful, so this static overlay seems not just pointless, but it risks covering up important details that the AI might need.

Compare the computer vision of The Terminator.

Many times you can excuse computer-PoV shots as technical legacy, that is, a debugging tool that developers built for themselves while developing the AI, and which the AI now uses for itself. In this case, it’s heavily implied that Unity provided the specifications for this system itself, so that doesn’t make sense.

The focus indicator does change over time, but it indicates focus in a way that, again, obscures other information in the visual feed and so is not in Unity’s interest. Color spaces are part of the way computers understand what they’re seeing, and there is no reason it should make it harder on itself, even if it is a super AI.

Largely extradiegetic

So, since a diegetic reading comes up empty, we have to look at this extradiegetically. That means as a tool for the audience to understand when they’re seeing through Unity’s eyes—rather than the movie’s—and via the focus indicator, what the AI is inspecting.

As such, it was probably pretty successful in the 1970s to instantly indicate computer-ness.

One reason is the typeface. The characters are derived from MICR, which stands for magnetic ink character recognition. It was established in the 1950s as a way to computerize check processing. Notably, the original font had only numerals and four control characters, no alphabetic ones.

Note also that these characters bear a style resemblance to the ones seen in the film but are not the same. Compare the 0 character here with the one in the screenshots, where that character gets a blob in the lower right stroke.

I want to give a shout-out to the film makers for not having this creeper scene focus on lascivious details, like butts or breasts. It’s a machine looking for signs of deception, and things like hands, microexpressions, and, so the song goes, kisses are more telling.

Still, MICR was a genuinely high-tech typeface of the time. The adult members of the audience would certainly have encountered the “weird” font in their personal lives while looking at checks, and likely understood its purpose, so was a good choice for 1970, even if the details were off.

Another is the inscrutability of the lines. Why are they there, in just that way? Their inscrutability is the point. Most people in audiences regard technology and computers as having arcane reasons for the way they are, and these rectilinear lines with odd greebles and nurnies invoke that same sensibility. All the whirring gizmos and bouncing bar charts of modern sci-fi interfaces exhibit the same kind of FUIgetry.

So for these reasons, while it had little to do with the substance of computer vision, its heart was in the right place to invoke computer-y-ness.

Dat Ending

At the very end of the film, though, after Unity asserts that in time humans will come to love it, Forbin staunchly says, “Never.” Then the film passes into a sequence that is hard to tell whether it’s meant to be diegetic or not.

In the first beat, the screen breaks into four different camera angles of Forbin at once. (The overlay is still there, as if this was from a single camera.)

This says more about computer vision than even the FUIgetry.

This sense of multiples continues in the second beat, as multiple shots repeat in a grid. The grid is clipped to a big circle that shrinks to a point and ends the film in a moment of blackness before credits roll.

Since it happens right before the credits, and it has no precedent in the film, I read it as not part of the movie, but a title sequence. And that sucks. I wish wish wish this had been the standard Unity-view from the start. It illustrates that Unity is not gathering its information from a single stereoscopic image, like humans and most vertebrates do, but from multiple feeds simultaneously. That’s alien. Not even insectoid, but part of how this AI senses the world.

Contact!

image04

Jack lands in a ruined stadium to do some repairs on a fallen drone. After he’s done, the drone takes a while to reboot, so while he waits, Jack’s mind drifts to the stadium and the memories he has of it.

Present information as it might be shared

Vika was in comms with Jack when she notices the alarm signal from the desktop interface. Her screen displays an all-caps red overlay reading ALERT, and a diamond overlaying the unidentified object careening toward him. She yells, “Contact! Left contact!” at Jack.

image02

As Jack hears Vika’s warning, he turns to look drawing his pistol reflexively as he crouches. While the weapon is loading he notices that the cause of the warning was just a small, not-so-hostile dog.

Although Vika yells about something coming from the left side, by looking at the screen you can kind of tell that it’s more to his back—his 6 or 7 o’ clock—than left. We’re seeing it with time to spare here, and the satellite image is very low-res, so we can cut her some slack. But given all the sensors at its command, the interface would ideally which way Jack is facing and which way the threat approaches, so she can convey correct and useful information quickly.

“Contact, at your 6, Jack!”

That’s much more precise and actionable for Jack.

image00

Don’t cover information

It might be useful to put the ALERT overlay somewhere other than on top of Jack, since it might obscure some useful information. Perhaps the “chrome” of the interface could turn red? Not as instantly readable for the audience, but if we’re designing for Vika…

Provide specifics

Another issue is that neither the satellite image nor the interface help Vika to identify what ends up being just a dog. Even when Jack manages to stay cool through the little scare jump, adding at least some information about the object would go a long way to make Vika and the situation less tense.

Jack’s encounter with the TET gives clear evidence that the TET has sophisticated computer vision, so the interface could help Vika a bit by “guessing” what any questionable object might be. It doesn’t need to be exact (and it probably couldn’t be with that kind of video feed) but the computer could give its educated guess just by analyzing the context, shape, and motion compared against things in the database. So instead of telling there is an 87% chance of being a dog or a 76% chance of being a fox, the interface could just predict unknown animal (see below).

recomp

Share off-screen information

Fast viewers saw the unknown object before the warning. During a split of a second while the object is entering the screen, it remains blue. So the computer does keep track of any movement, even if it’s not a threat. In that case the issue is that the computer seems to be tracking movement far beyond the visible area of the screen but it doesn’t let Vika know something’s coming from off-screen. The display doesn’t need to zoom out to reach the contact—that could distract Vika from following Jack—but at least it could show some kind of signal pointing at the incoming contact.

What of multiple contacts?

I’m cautious to talk about what ifs, since most of it is just guesswork—but bear with me. On the sequence the interface keeps track of just one contact, but how it would behave if there were more than one? If the computer does track of contacts beyond the camera display Vika is watching, then just marking them is not enough. If Vika needs to inform Jack on the number of contacts she’s getting on the screen, then you need some sort of overview. Pointing at the direction of the contact is useful, but it does mean you have to sweep all the screen to know how many of them are. But that can be easily fixed by adding a list of all the current contacts.

Show trending

Pausing the film a bit and looking closely, it seems that the only difference between all-is-fine and contact! with the dog is about a meter long. And what is more, by the time the interface triggers the warning the dog is really close to Jack. If that was feral dog and it was to attack him, the warning to Jack would come very late.

In such mission-critical monitoring, it’s not enough to show changes of state. Change the state subtly to indicate as things are trending—as in, this dog is likely to continue its intercept course and getting closer.

We got this

So to wrap up, the interface does a well enough job, but it could certainly benefit from some design changes. The issues are ones that any designer might have to face when working with a monitoring interface, so worth summarizing.

  • Share all the information that is at hand
  • Give the user the information in the form they might pass it along
  • Assign an easy-to-distinguish hierarchy: information, suspicion, warning
  • Provide best-guesses as to the nature of problems with as much specificity as you can
  • Provide unobtrusive but clear signals about the mode
  • Anticipate and show trending dangers

Her: interface components (2/8)

Depending on how you slice things, the OS1 interface consists of five components and three (and a half) capabilities.

Her-earpiece

1. An Earpiece

The earpiece is small and wireless, just large enough to fit snugly in the ear and provide an easy handle for pulling out again. It has two modes. When the earpiece is in Theodore’s ear, it’s in private mode, hearable only by him. When the earpiece is out, the speaker is as loud as a human speaking at room volume. It can produce both voice and other sounds, offering a few beeps and boops to signal needing attention and changes in the mode.

Her-cameo

2. Cameo phone

I think I have to make up a name for this device, and “cameo phone” seems to fit. This small, hand-sized, bi-fold device has one camera on the outside an one on the inside of the recto, and a display screen on the inside of the verso. It folds along its long edge, unlike the old clamshell phones. The has smartphone capabilities. It wirelessly communicates with the internet. Theodore occasionally slides his finger left to right across the wood, so it has some touch-gesture sensitivity. A stripe around the outside-edge of the cameo can glow red to act as a visual signal to get its user’s attention. This is quite useful when the cameo is folded up and sitting on a nightstand, for instance.

Theodore uses Samantha almost exclusively through the earpiece and cameo phone, and it is this that makes OS1 a wearable system.

3. A beauty-mark camera

Only present for the surrogate sex scene, this small wireless (are we at the point when we can stop specifying that?) camera affixes to the skin and has the appearance of a beauty mark.

4. (Unseen) microphones

Whether in the cameo phone, the desktop screen, or ubiquitously throughout the environment, OS1 can hear Theodore speak wherever he is over the course of the film.

5. Desktop screen

Theodore only uses a large monitor for OS1 on his desktop a few times. It is simply another access point as far as OS1 is concerned. Really, there’s nothing remarkable about this screen. It is notable that there’s no keyboard. All input is provided by either voice, camera, or a touch gesture on the cameo.

Her-install01

If those are components to the interface, they provide the medium for her 3.5 capabilities.

Her capabilities

1. Voice interface

Users can speak to OS1 in fully-natural language, as if speaking to another person. OS1 speaks back with fully-human spoken articulation. Theodore’s older OS had a voice interface, but because of its lack of artificial intelligence driving it, the interactions were limited to constrained commands like, “Read email.”

2. Computer vision

Samantha can process what she sees through the camera lens of the cameo perfectly. She recognizes distinct objects, people, and gestures at the physical and pragmatic level. I don’t think we ever see things from Samatha’s perspective, but we do have a few quick close ups of the camera lens.

3. Artificial Intelligence

The most salient aspect of the interface is that OS1 is a fully realized “Strong” artificial intelligence.

It would like me to try and get to some painfully-crafted definition of what counts as either an artificial intelligence or sentience, but in this case we don’t really need a tight definition to help suss out whether or not Samantha is one. That’s the central conceit of the film, and the evidence is just overwhelming.

  • She has a human command of language.
  • She’s fully versed in the nuances of human emotion (and Theodore has a glut of them to engage).
  • She has emotions and can fairly be described as emotional. She has a sexual drive.
  • She has existential crises and a rich theory of mind. At one point she dreamily asks Theodore “What’s it like to be alive in that room right now?” as if she was a philosophical teen idly chatting with her boyfriend over the phone.
  • She commits lies of omission in hiding uncomfortable truths.
  • She changes over time. She solves problems. She learns. She creates.
  • She has a sense of humor. When Theodore tells her early on to “read email” in the weird toComputerese (my name for that 1970s dialect of English spoken only between humans and machines) grammar he had been using with his old operating system, Samantha jokingly adopts a robotic voice and replies, “OK. I will read the email for Theodore Twombly” and gets a good laugh out of him before he apologizes.

Pedants will have some fun discussing whether this is apt but I’m moving forward with it as a given. She’s sentient.

3.5 An “operating system”

This item only counts as half a thing because Theodore uses it as an operating system maaaybe twice in the film. Really, this categorization is a MacGuffin to explain why he gets it in the first place, but it has little to no other bearing on the film.

scarlettjoclippy

What’s missing?

Notably missing in OS1 is a face or any other visual anthropomorphic aspect. There’s no Samantha-faced Clippy. Notice that she’s very carefully disembodied. Jonze does not spend screen time close up on her camera lens, like Kubrick did with HAL’s unblinking eye. Had he done so, it would have given us the impression that she’s somewhere behind that eye. But she’s not. Even in the prop design, he makes sure the camera lens itself looks unremarkable, neutral, and unexpressive, and never gets a lingering focus.

Her “organs,” like the cameo and earpiece, don’t even connect together physically at all. Speaking as she does through the earpiece means she doesn’t exist as a voice from some speaker mounted to the wall. She exists across various displays and devices, in some psychological ether between them. For us, she’s a voiceover existing everywhere at once. For Theodore, she’s just a delightful voice in his head. An angel—or possibly a ghost—borne unto him.

This disembodiment (both the design and the cinematic treatment) frees Theodore and the audience from the negative associations of many other sci-fi intelligences, robots, and unfortunate experiments in commercial artificial intelligence that got trapped in the muck of the uncanny valley. One of the main reasons designers have to be careful about invoking the anthropomorphic sense in users is because it will raise expectations of human capabilities that modern technology just can’t match. But OS1 can match and exceed those expectations, since it’s an AI in a work of fiction, so Jonze is free of that constraint.

And having no visual to accompany a human-like voice allows users to imagine our own “perfect” embodiment to the voice. Relying on the imagination to provide the visuals makes the emotional engagement greater, as it does with our crushes on radio personalities, or the unseen monster in a horror movie. Movies can never create as fulfilling an image for an individual audience member as their imagination can. Theodore could picture whatever he wanted to–even if he wanted to–to accompany Samantha’s computer-generated voice. Unfortunately for the audience, Jonze cast Scarlett Johansen, a popular actress whose image we are instantly able to recall upon hearing her husky, sultry voice, so the imagined-perfection is more difficult for us.

This is just the components and capabilities. Tomorrow we’ll look at some of the key interactions with OS1.