A special subset of spacesuit interfaces is the communication subsystems. I wrote a whole chapter about Communications in Make It So, but spacesuit comms bear special mention, since they’re usually used in close physical proximity but still must be mediated by technology, the channels for detailed control are clumsy and packed, and these communicators are often being overseen by a mission control center of some sort. You’d think this is rich territory, but spoiler: There’s not a lot of variation to study.
Every single spacesuit in the survey has audio. This is so ubiquitous and accepted that, after 1950, no filmmaker has thought the need to explain it or show an interface for it. So you’d think that we’d see a lot of interactions.
Spacesuit communications in sci-fi tend to be many-to-many with no apparent means of control. Not even a push-to-mute if you sneezed into your mic. It’s as if the spacewalkers were in a group, merely standing near each other in air, chatting. No push-to-talk or volume control is seen. Communication with Mission Control is automatic. No audio cues are given to indicate distance, direction, or source of the sound, or to select a subset of recipients.
The one seeming exception to the many-to-many communication is seen in the reboot of Battlestar Galactica. As Boomer is operating a ship above a ground crew, shining a light down on them for visibility, she has the following conversation with Tyrol.
Raptor 478, this is DC-1, I have you in my sights.
Copy that, DC-1. I have you in sight.
How’s it looking there? Can you tell what happened?
Lieutenant, don’t worry…about my team. I got things under control.
Copy that, DC-1. I feel better knowing you’re on it.
Then, when her copilot gives her a look about what she has just said, she says curtly to him, “Watch the light, you’re off target.” In this exchange there is clear evidence that the copilot has heard the first conversation, but it appears that her comment to him is addressed to him and not for the others to hear. Additionally, we do not hear chatter going on between the ground grew during this exchange. Unfortunately, we do not see any of the conversationalists touch a control to give us an idea about how they switch between these modes. So, you know, still nothing.
More recent films, especially in the MCU, has seen all sorts of communication controlled by voice with the magic of General AI…pause for gif…
…but as I mention more and more, once you have a General AI in the picture, we leave the realm of critique-able interactions. Because an AI did it.
In short, sci-fi just doesn’t care about showing audio controls in sci-fi spacesuits, and isn’t likely to start caring anytime soon. As always, if you know of something outside my survey, please mention it.
For reference, in the real world, a NASA astronaut has direct control over the volume of audio that she hears, using potentiometer volume controls. (Curiously the numbers on them are not backwards, unlike the rest of the controls.)
A spacewalker uses the COMM dial switch mode selector at the top of the DCM to select between three different frequencies of wireless communication, each of which broadcasts to each other and the vehicle. When an astronaut is on one of the first two channels, transmission is voice-activated. But a backup, “party line” channel requires push-to-talk, and this is what the push-to-talk control is for.
By default, all audio is broadcast to all other spacewalkers, the vehicle, and Mission Control. To speak privately, without Mission Control hearing, spacewalkers don’t have an engineered option. But if one of the radio frequency bands happens to be suffering a loss of signal to Mission Control, she can use this technological blind spot to talk with some degree of privacy.
Remote operation appears twice during Black Panther. This post describes the second, in which CIA Agent Ross remote-pilots the Talon in order to chase down cargo airships carrying Killmonger’s war supplies. The prior post describes the first, in which Shuri remotely drives an automobile.
In this sequence, Shuri equips Ross with kimoyo beads and a bone-conducting communication chip, and tells him that he must shoot down the cargo ships down before they cross beyond the Wakandan border. As soon as she tosses a remote-control kimoyo bead onto the Talon, Griot announces to Ross in the lab “Remote piloting system activated” and creates a piloting seat out of vibranium dust for him. Savvy watchers may wonder at this, since Okoye pilots the thing by meditation and Ross would have no meditation-pilot training, but Shuri explains to him, “I made it American style for you. Get in!” He does, grabs the sparkly black controls, and gets to business.
The most remarkable thing to me about the interface is how seamlessly the Talon can be piloted by vastly different controls. Meditation brain control? Can do. Joystick-and-throttle? Just as can do.
Now, generally, I have a beef with the notion of hyperindividualized UI tailoring—it prevents vital communication across a community of practice (read more about my critique of this goal here)—but in this case, there is zero time for Ross to learn a new interface. So sure, give him a control system with which he feels comfortable to handle this emergency. It makes him feel more at ease.
The mutable nature of the controls tells us that there is a robust interface layer that is interpreting whatever inputs the pilot supplies and applying them to the actuators in the Talon. More on this below. Spoiler: it’s Griot.
Too sparse HUD
The HUD presents a simple circle-in-a-triangle reticle that lights up red when a target is in sights. Otherwise it’s notably empty of augmentation. There’s no tunnel in the sky display to describe the ideal path, or proximity warnings about skyscrapers, or airspeed indicator, or altimeter, or…anything. This seems a glaring omission since we can be certain other “American-style” airships have such things. More on why this might be below, but spoiler: It’s Griot.
What do these controls do, exactly?
I take no joy in gotchas. That said…
When Ross launches the Talon, he does so by pulling the right joystick backward.
When he shoots down the first cargo ship over Birnin Zana, he pushes the same joystick forward as he pulls the trigger, firing energy weapons.
Why would the same control do both? It’s hard to believe it’s modal. Extradiegetically, this is probably an artifact of actor Martin Freeman’s just doing what feels dramatic, but for a real-world equivalent I would advise against having physical controls have wholly different modes on the same grip, lest we risk confusing pilots on mission-critical tasks. But spoiler…oh, you know where this is going.
Diegetically, Shuri is flat-out wrong that Ross is an experienced pilot. But she also knew that it didn’t matter, because her lab has him covered anyway. Griot is an AI with a brain interface, and can read Ross’ intentions, handling all the difficult execution itself.
This would also explain the lack of better HUD augmentation. That absence seems especially egregious considering that the first cargo ship was flying over a crowded city at the time it was being targeted. If Ross had fired in the wrong place, the cargo ship might have crashed into a building, or down to the bustling city street, killing people. But, instead, Griot quietly, precisely targets the ship for him, to insure that it would crash safely in nearby water.
This would also explain how wildly different interfaces can control the Talon with similar efficacy.
So, Occams-apology says, yep, it’s Griot.
An AI-wizard did it?
In the post about Shuri’s remote driving, I suggested that Griot was also helping her execute driving behind the scenes. This hearkens back to both the Iron HUD and Doctor Strange’s Cloak of Levitation. It could be that the MCU isn’t really worrying about the details of its enabling technologies, or that this is a brilliant model for our future relationship with technology. Let us feel like heroes, and let the AI manage all the details. I worry that I’m building myself into a wizard-did-it pattern, inserting AI for wizard. Maybe that’s worth another post all its own.
But there is one other thing about Ross’ interface worth noting.
The sonic overload
When the last of the cargo ships is nearly at the border, Ross reports to Shuri that he can’t chase it, because Killmonger-loyal dragon flyers have “got me trapped with some kind of cables.” She instructs him to, “Make an X with your arms!” He does. A wing-like display appears around him, confirming its readiness.
Then she shouts, “Now break it!” he does, and the Talon goes boom shaking off the enemy ships, allowing Ross to continue his pursuit.
First, what a great gesture for this function. Very ordinarily, Wakandans are piloting the Talon, and each of them would be deeply familiar with this gesture, and even prone to think of it when executing a hail Mary move like this.
Second, when an outsider needed to perform the action, why didn’t she just tell Griot to just do it? If there’s an interpretation layer in the system, why not just speak directly to that controller? It might be so the human knows how to do it themselves next time, but this is the last cargo ship he’s been tasked with chasing, and there’s little chance of his officially joining the Wakandan air force. The emergency will be over after this instance. Maybe Wakandans have a principle that they are first supposed to engage the humans before bringing in the machines, but that’s heavy conjecture.
Third, I have a beef about gestures—there’s often zero affordances to tell users what gestures they can do, and what effects those gestures will have. If Shuri was not there to answer Ross’ urgent question, would the mission have just…failed? Seems like a bad design.
How else could have known he could do this? If Griot is on board, Griot could have mentioned it. But avoiding the wizard-did-it solutions, some sort of context-aware display could detect that the ship is tethered to something, and display the gesture on the HUD for him. This violates the principle of letting the humans be the heroes, but would be a critical inclusion in any similar real-world system.
Any time we are faced with “intuitive” controls that don’t map 1:1 to the thing being controlled, we’re faced with similar problems. (We’ve seen the same problems in Sleep Dealer and Lost in Space (1998). Maybe that’s worth its own write-up.) Some controls won’t map to anything. More problematic is that there will be functions which don’t have controls. Designers can’t rely on having a human cavalry like Shuri there to save the day, and should take steps to find ways that the system can inform users of how to activate those functions.
Fit to purpose?
I’ve had to presume a lot about this interface. But if those things are correct, then, sure, this mostly makes it possible for Ross, a novice to piloting, to contribute something to the team mission, while upholding the directive that AI Cannot Be Heroes.
If Griot is not secretly driving, and that directive not really a thing, then the HUD needs more work, I can’t diegetically explain the controls, and they need to develop just-in-time suggestions to patch the gap of the mismatched interface.
Black Georgia Matters
Each post in the Black Panther review is followed by actions that you can take to support black lives. As this critical special election is still coming up, this is a repeat of the last one, modified to reflect passed deadlines.
Despite outrageous, anti-democratic voter suppression by the GOP, for the first time in 28 years, Georgia went blue for the presidential election, verified with two hand recounts. Credit to Stacey Abrams and her team’s years of effort to get out the Georgian—and particularly the powerful black Georgian—vote.
But the story doesn’t end there. Though the Biden/Harris ticket won the election, if the Senate stays majority red, Moscow Mitch McConnell will continue the infuriating obstructionism with which he held back Obama’s efforts in office for eight years. The Republicans will, as they have done before, ensure that nothing gets done.
To start to undo the damage the fascist and racist Trump administration has done, and maybe make some actual progress in the US, we need the Senate majority blue. Georgia is providing that opportunity. Neither of the wretched Republican incumbents got 50% of the vote, resulting in a special runoff election January 5, 2021. If these two seats go to the Democratic challengers, Warnock and Ossof, it will flip the Senate blue, and the nation can begin to seriously right the sinking ship that is America.
Residents can also volunteer to become a canvasser for either of the campaigns, though it’s a tough thing to ask in the middle of the raging pandemic.
The rest of us (yes, even non-American readers) can contribute either to the campaigns directly using the links above, or to Stacey Abrams’ Fair Fight campaign. From the campaign’s web site:
We promote fair elections in Georgia and around the country, encourage voter participation in elections, and educate voters about elections and their voting rights. Fair Fight brings awareness to the public on election reform, advocates for election reform at all levels, and engages in other voter education programs and communications.
We will continue moving the country into the anti-racist future regardless of the runoff, but we can make much, much more progress if we win this election. Please join the efforts as best you can even as you take care of yourself and your loved ones over the holidays. So very much depends on it.
Black Reparations Matter
This is timely, so I’m adding this on as well rather than waiting for the next post: A bill is in the house to set up a commission to examine the institution of slavery and its impact and make recommendations for reparations to Congress. If you are an American citizen, please consider sending a message to your congresspeople asking them to support the bill.
On this ACLU site you will find a form and suggested wording to help you along.
before we get into the Kimoyo beads, or the Cape Shields, or the remote driving systems…
before I have to dismiss these interactions as “a wizard did it” style non-designs…
before I review other brain-computer interfaces in other shows…
…I wanted check on the state of the art of brain-computer interfaces (or BCIs) and see how our understanding had advanced since I wrote the Brain interface chapter in the book, back in the halcyon days of 2012.
Note that I am deliberately avoiding the tech side of this question. I’m not going to talk about EEG, PET, MRI, and fMRI. (Though they’re linked in case you want to learn more.) Modern brain-computer interface (or BCI) technologies are evolving too rapidly to bother with an overview of them. They’ll change in the real world by the time I press “publish,” much less by the time you read this. And sci-fi tech is most often a black box anyway. But the human part of the human-computer interaction model changes much more slowly. We can look to the brain as a relatively-unalterable component of the BCI question, leading us to two believability questions of sci-fi BCI.
How can people express intent using their brains?
How do we prevent accidental activation using BCI?
Let’s discuss each.
1. How can people express intent using their brains?
In the see-think-do loop of human-computer interaction…
See (perceive) has been a subject of visual, industrial, and auditory design.
Think has been a matter of human cognition as informed by system interaction and content design.
Do has long been a matter of some muscular movement that the system can detect, to start its matching input-process-output loop. Tap a button. Move a mouse. Touch a screen. Focus on something with your eyes. Hold your breath. These are all ways of “doing” with muscles.
But the first promise of BCI is to let that doing part happen with your brain. The brain isn’t a muscle, so what actions are BCI users able to take in their heads to signal to a BCI system what they want it to do? The answer to this question is partly physiological, about the way the brain changes as it goes about its thinking business.
Our brains are a dense network of bioelectric signals, chemicals, and blood flow. But it’s not chaos. It’s organized. It’s locally functionalized, meaning that certain parts of the brain are predictably activated when we think about certain things. But it’s not like the Christmas lights in Stranger Things, with one part lighting up discretely at a time. It’s more like an animated proportional symbol map, with lots of places lighting up at the same time to different degrees.
The sizes and shapes of what’s lighting up may change slightly between people, but a basic map of healthy, undamaged brains will be similar to each other. Lots of work has gone on to map these functional areas, with researchers showing subjects lots of stimuli and noting what areas of the brain light up. Test enough of these subjects and you can build a pretty good functional map of concepts. Thereafter, you can take a “picture” of the brain, and you can cross-reference your maps to reverse-engineer what is being thought.
Right now those pictures are pretty crude and slow, but so were the first actual photographs in the world. In 20–50 years, we may be able to wear baseball caps that provide a much more high-resolution, real time inputs of concepts being thought. In the far future (or, say, the alternate history of the MCU) it is conceivable to read these things from a distance. (Though there are significant ethical questions involved in such a technology, this post is focused on questions of viability and interaction.)
Similarly the brain maps we have are only for a small percentage of an average adult vocabulary. Jack Gallant’s semantic map viewer (pictured and linked above) shows the maps for about 140 concepts, and estimates of average active vocabulary is around 20,000 words, so we’re looking at a tenth of a tenth of what we can imagine (not even counting the infinite composability of language). But in the future we will not only have more concepts mapped, more confidently, but we will also have idiographs for each individual, like the personal dictionary in your smart phone.
All this is to say that our extant real world technology confirms that thoughts are a believable input for a system. This includes linguistic inputs like “Turn on the light” and “activate the vibranium sand table” and “Sincerely, Chris” and even imagining the desired change, like a light changing from dark to light. It might even include subconscious thoughts that yet to be formed into words.
2. How do we prevent accidental activation?
But we know from personal experience, we don’t want all our thoughts to be acted on. Take, for example, those thoughts you’re feeling hangry, or snarky, or dealing with a jerk-in-authority. Or those texts and emails that you’ve composed in the heat of the moment but wisely deleted before they get you in trouble.
If a speculative BCI is being read by a general artificial intelligence, it can manage that just like a smart human partner would.
He is composing a blog post, reasons the AGI, so I will just disregard his thought that he needs to pee.
And if there’s any doubt, an AGI can ask. “Did you intend me to include the bit about pee in the post?” Me: “Certainly not. Also BRB.” (Readers following the Black Panther reviews will note that AGI is available to Wakandans in the form of Griot.)
If AGI is unavailable to the diegesis (and it would significantly change any diegesis of which it is a part) then we need some way to indicate when a thought is intended as input and when it isn’t. Having that be some mode of thought feels complicated and error-prone, like when programmers have to write regex expressions that escape escape characters. Better I think is to use some secondary channel, like a bodily interaction. Touch forefinger and pinky together, for instance, and the computer understands you intend your thoughts as input.
So, for any BCI that appears in sci-fi, we would want to look for the presence or absence of AGI as a reasonableness interpreter, and, barring that, for some alternate-channel mechanism for indicating deliberateness. We would also hope to see some feedback and correction loops to understand the nuances of the edge-case interactions, but these are rare in sci-fi.
Even more future-full
This all points to the question of what seeing/perceiving via a BCI might be. A simple example might be a disembodied voice that only the user can hear.
A woman walks alone at night. Lost in thoughts, she hears her AI whisper to her thoughts, “Ada, be aware that a man has just left a shadowy doorstep and is following, half a block behind you. Shall I initialize your shock shoes?”
What other than language can be written to the brain in the far future? Images? Movies? Ideas? A suspicion? A compulsion? A hunch? How will people know what are their own thoughts and what has been placed there from the outside? I look forward to the stories and shows that illustrate new ideas, and warn us of the dark pitfalls.
At around the midpoint of the movie, Deckard calls Rachel from a public videophone in a vain attempt to get her to join him in a seedy bar. Let’s first look at the device, then the interactions, and finally take a critical eye to this thing.
The lower part of the panel is a set of back-lit instructions and an input panel, which consists of a standard 12-key numeric input and a “start” button. Each of these momentary pushbuttons are back-lit white and have a red outline.
In the middle-right of the panel we see an illuminated orange logo panel, bearing the Saul Bass Bell System logo and the text reading, “VID-PHŌN” in some pale yellow, custom sans-serif logotype. The line over the O, in case you are unfamiliar, is a macron, indicating that the vowel below should be pronounced as a long vowel, so the brand should be pronounced “vid-phone” not “vid-fahn.”
In the middle-left there is a red “transmitting” button (in all lower case, a rarity) and a black panel that likely houses the camera and microphone. The transmitting button is dark until he interacts with the 12-key input, see below.
At the top of the panel, a small cathode-ray tube screen at face height displays data before and after the call as well as the live video feed during the call. All the text on the CRT is in a fixed-width typeface. A nice bit of worldbuilding sees this screen covered in Sharpie graffiti.
His interaction is straightforward. He approaches the nook and inserts a payment card. In response, the panel—including its instructions and buttons—illuminates. A confirmation of the card holder’s identity appears in the in the upper left of the CRT, i.e. “Deckard, R.,” along with his phone number, “555-6328” (Fun fact: if you misdialed those last four numbers you might end up talking to the Ghostbusters) and some additional identifying numbers.
A red legend at the bottom of the CRT prompts him to “PLEASE DIAL.” It is outlined with what look like ASCII box-drawing characters. He presses the START button and then dials “555-7583” on the 12-key. As soon as the first number is pressed, the “transmitting” button illuminates. As he enters digits, they are simultaneously displayed for him on screen.
His hands are not in-frame as he commits the number and the system calls Rachel. So whether he pressed an enter key, #, or *; or the system just recognizes he’s entered seven digits is hard to say.
After their conversation is complete, her live video feed goes blank, and TOTAL CHARGE $1.25, is displayed for his review.
Chapter 10 of the book Make It So: Interaction Design Lessons from Science Fiction is dedicated to Communication, and in this post I’ll use the framework I developed there to review the VID-PHŌN, with one exception: this device is public and Deckard has to pay to use it, so he has to specify a payment method, and then the system will report back total charges. That wasn’t in the original chapter and in retrospect, it should have been.
Turns out this panel is just the right height for Deckard. How do people of different heights or seated in a wheelchair fare? It would be nice if it had some apparent ability to adjust for various body heights. Similarly, I wonder how it might work for differently-abled users, but of course in cinema we rarely get to closely inspect devices for such things.
Deckard has to insert a payment card before the screen illuminates. It’s nice that the activation entails specifying payment, but how would someone new to the device know to do this? At the very least there should be some illuminated call to action like “insert payment card to begin,” or better yet some iconography so there is no language dependency. Then when the payment card was inserted, the rest of the interface can illuminate and act as a sort of dial-tone that says, “OK, I’m listening.”
Specifying a recipient: Unique Identifier
In Make It So, I suggest five methods of specifying a recipient: fixed connection, operator, unique identifier, stored contacts, and global search. Since this interaction is building on the experience of using a 1982 public pay phone, the 7-digit identifier quickly helps audiences familiar with American telephone standards understand what’s happening. So even if Scott had foreseen the phone explosion that led in 1994 to the ten-digit-dialing standard, or the 2053 events that led to the thirteen-digital-dialing standard, it would have likely have confused audiences. So it would have slightly risked the read of this scene. It’s forgivable.
I have a tiny critique over the transmitting button. It should only turn on once he’s finished entering the phone number. That way they’re not wasting bandwidth on his dialing speed or on misdials. Let the user finish, review, correct if they need to, and then send. But, again, this is 1982 and direct entry is the way phones worked. If you misdialed, you had to hang up and start over again. Still, I don’t think having the transmitting light up after he entered the 7th digit would have caused any viewers to go all hruh?
There are important privacy questions to displaying a recipient’s number in a way that any passer-by can see. Better would have been to mount the input and the contact display on a transverse panel where he could enter and confirm it with little risk of lookie-loos and identity theives.
Audio & Video
Hopefully, when Rachel received the call, she was informed who it was and that the call was coming from a public video phone. Hopefully it also provided controls for only accepting the audio, in case she was not camera-ready, but we don’t see things from her side in this scene.
Gaze correction is usually needed in video conversation systems since each participant naturally looks at the center of the screen and not at the camera lens mounted somewhere next to its edge. Unless the camera is located in the center of the screen (or the other person’s image on the screen), people would not be “looking” at the other person as is almost always portrayed. Instead, their gaze would appear slightly off-screen. This is a common trope in cinema, but one which we’re become increasingly literate in, as many of us are working from home much more and gaining experience with videoconferencing systems, so it’s beginning to strain suspension of disbelief.
Also how does the sound work here? It’s a noisy street scene outside of a cabaret. Is it a directional mic and directional speaker? How does he adjust the volume if it’s just too loud? How does it remain audible yet private? Small directional speakers that followed his head movements would be a lovely touch.
And then there’s video privacy. If this were the real world, it would be nice if the video had a privacy screen filter. That would have the secondary effect of keeping his head in the right place for the camera. But that is difficult to show cinemagentically, so wouldn’t work for a movie.
Ending the call
Rachel leans forward to press a button on her home video phone end her part of the call. Presumably Deckard has a similar button to press on his end as well. He should be able to just yank his card out, too.
The closing screen is a nice touch, though total charges may not be the most useful thing. Are VID-PHŌN calls a fixed price? Then this information is not really of use to him after the call as much as it is beforehand. If the call has a variable cost, depending on long distance and duration, for example, then he would want to know the charges as the call is underway, so he can wrap things up if it’s getting too expensive. (Admittedly the Bell System wouldn’t want that, so it’s sensible worldbuilding to omit it.) Also if this is a pre-paid phone card, seeing his remaining balance would be more useful.
But still, the point was that total charges of $1.25 was meant to future-shocked audiences of the time, since public phone charges in the United States at the time were $0.10. His remaining balance wouldn’t have shown that and not had the desired effect. Maybe both? It might have been a cool bit of worldbuilding and callback to build on that shock to follow that outrageous price with “Get this call free! Watch a video of life in the offworld colonies! Press START and keep your eyes ON THE SCREEN.”
Back to Blade Runner. I mean, the pandemic is still pandemicking, but maybe this will be a nice distraction while you shelter in place. Because you’re smart, sheltering in place as much as you can, and not injecting disinfectants. And, like so many other technologies in this film, this will take a while to deconstruct, critique, and reimagine.
Doing his detective work, Deckard retrieves a set of snapshots from Leon’s hotel room, and he brings them home. Something in the one pictured above catches his eye, and he wants to investigate it in greater detail. He takes the photograph and inserts it in a black device he keeps in his living room.
Note: I’ll try and describe this interaction in text, but it is much easier to conceptualize after viewing it. Owing to copyright restrictions, I cannot upload this length of video with the original audio, so I have added pre-rendered closed captions to it, below. All dialogue in the clip is Deckard.
He inserts the snapshot into a horizontal slit and turns the machine on. A thin, horizontal orange line glows on the left side of the front panel. A series of seemingly random-length orange lines begin to chase one another in a single-row space that stretches across the remainder of the panel and continue to do so throughout Deckard’s use of it. (Imagine a news ticker, running backwards, where the “headlines” are glowing amber lines.) This seems useless and an absolutely pointless distraction for Deckard, putting high-contrast motion in his peripheral vision, which fights for attention with the actual, interesting content down below.
After a second, the screen reveals a blue grid, behind which the scan of the snapshot appears. He stares at the image in the grid for a moment, and speaks a set of instructions, “Enhance 224 to 176.”
In response, three data points appear overlaying the image at the bottom of the screen. Each has a two-letter label and a four-digit number, e.g. “ZM 0000 NS 0000 EW 0000.” The NS and EW—presumably North-South and East-West coordinates, respectively—immediately update to read, “ZM 0000 NS 0197 EW 0334.” After updating the numbers, the screen displays a crosshairs, which target a single rectangle in the grid.
A new rectangle then zooms in from the edges to match the targeted rectangle, as the ZM number—presumably zoom, or magnification—increases. When the animated rectangle reaches the targeted rectangle, its outline blinks yellow a few times. Then the contents of the rectangle are enlarged to fill the screen, in a series of steps which are punctuated with sounds similar to a mechanical camera aperture. The enlargement is perfectly resolved. The overlay disappears until the next set of spoken commands. The system response between Deckard’s issuing the command and the device’s showing the final enlarged image is about 11 seconds.
Deckard studies the new image for awhile before issuing another command. This time he says, “Enhance.” The image enlarges in similar clacking steps until he tells it, “Stop.”
Other instructions he is heard to give include “move in, pull out, track right, center in, pull back, center, and pan right.” Some include discrete instructions, such as, “Track 45 right” while others are relative commands that the system obeys until told to stop, such as “Go right.”
Using such commands he isolates part of the image that reveals an important clue, and he speaks the instruction, “Give me a hard copy right there.” The machine prints the image, which Deckard uses to help find the replicant pictured.
I’d like to point out one bit of sophistication before the critique. Deckard can issue a command with or without a parameter, and the inspector knows what to do. For example, “Track 45 right” and “Track right.” Without the parameter, it will just do the thing repeatedly until told to stop. That helps Deckard issue the same basic command when he knows exactly where he wants to look and when doesn’t know what exactly what he’s looking for. That’s a nice feature of the language design.
But still, asking him to provide step-by-step instructions in this clunky way feels like some high-tech Big Trak. (I tried to find a reference that was as old as the film.) And that’s not all…
Some critiques, as it is
Can I go back and mention that amber distracto-light? Because it’s distracting. And pointless. I’m not mad. I’m just disappointed.
It sure would be nice if any of the numbers on screen made sense, and had any bearing with the numbers Deckard speaks, at any time during the interaction. For instance, the initial zoom (I checked in Photoshop) is around 304%, which is neither the 224 or 176 that Deckard speaks.
It might be that each square has a number, and he simply has to name the two squares at the extents of the zoom he wants, letting the machine find the extents, but where is the labeling? Did he have to memorize an address for each pixel? How does that work at arbitrary levels of zoom?
And if he’s memorized it, why show the overlay at all?
Why the seizure-inducing flashing in the transition sequences? Sure, I get that lots of technologies have unfortunate effects when constrained by mechanics, but this is digital.
Why is the printed picture so unlike the still image where he asks for a hard copy?
Gaze at the reflection in Ford’s hazel, hazel eyes, and it’s clear he’s playing Missile Command, rather than paying attention to this interface at all. (OK, that’s the filmmaker’s issue, not a part of the interface, but still, come on.)
How might it be improved for 1982?
So if 1982 Ridley Scott was telling me in post that we couldn’t reshoot Harrison Ford, and we had to make it just work with what we had, here’s what I’d do…
Squash the grid so the cells match the 4:3 ratio of the NTSC screen. Overlay the address of each cell, while highlighting column and row identifiers at the edges. Have the first cell’s outline illuminate as he speaks it, and have the outline expand to encompass the second named cell. Then zoom, removing the cell labels during the transition. When at anything other than full view, display a map across four cells that shows the zoom visually in the context of the whole.
With this interface, the structure of the existing conversation makes more sense. When Deckard said, “Enhance 203 to 608” the thing would zoom in on the mirror, and the small map would confirm.
The numbers wouldn’t match up, but it’s pretty obvious from the final cut that Scott didn’t care about that (or, more charitably, ran out of time). Anyway I would be doing this under protest, because I would argue this interaction needs to be fixed in the script.
How might it be improved for 2020?
What’s really nifty about this technology is that it’s not just a photograph. Look close in the scene, and Deckard isn’t just doing CSI Enhance! commands (or, to be less mocking, AI upscaling). He’s using the photo inspector to look around corners and at objects that are reconstructed from the smallest reflections. So we can think of the interaction like he’s controlling a drone through a 3D still life, looking for a lead to help him further the case.
With that in mind, let’s talk about the display.
To redesign it, we have to decide at a foundational level how we think this works, because it will color what the display looks like. Is this all data that’s captured from some crazy 3D camera and available in the image? Or is it being inferred from details in the 2 dimensional image? Let’s call the first the 3D capture, and the second the 3D inference.
If we decide this is a 3-D capture, then all the data that he observes through the machine has the same degree of confidence. If, however, we decide this is a 3D inferrer, Deckard needs to treat the inferred data with more skepticism than the data the camera directly captured. The 3-D inferrer is the harder problem, and raises some issues that we must deal with in modern AI, so let’s just say that’s the way this speculative technology works.
The first thing the display should do it make it clear what is observed and what is inferred. How you do this is partly a matter of visual design and style, but partly a matter of diegetic logic. The first pass would be to render everything in the camera frustum photo-realistically, and then render everything outside of that in a way that signals its confidence level. The comp below illustrates one way this might be done.
In the comp, Deckard has turned the “drone” from the “actual photo,” seen off to the right, toward the inferred space on the left. The monochrome color treatment provides that first high-confidence signal.
In the scene, the primary inference would come from reading the reflections in the disco ball overhead lamp, maybe augmented with plans for the apartment that could be found online, or maybe purchase receipts for appliances, etc. Everything it can reconstruct from the reflection and high-confidence sources has solid black lines, a second-level signal.
The smaller knickknacks that are out of the reflection of the disco ball, and implied from other, less reflective surfaces, are rendered without the black lines and blurred. This provides a signal that the algorithm has a very low confidence in its inference.
This is just one (not very visually interesting) way to handle it, but should illustrate that, to be believable, the photo inspector shouldn’t have a single rendering style outside the frustum. It would need something akin to these levels to help Deckard instantly recognize how much he should trust what he’s seeing.
Flat screen or volumetric projection?
Modern CGI loves big volumetric projections. (e.g. it was the central novum of last year’s Fritz winner, Spider-Man: Far From Home.) And it would be a wonderful juxtaposition to see Deckard in a holodeck-like recreation of Leon’s apartment, with all the visual treatments described above.
…that would kind of spoil the mood of the scene. This isn’t just about Deckard’s finding a clue, we also see a little about who he is and what his life is like. We see the smoky apartment. We see the drab couch. We see the stack of old detective machines. We see the neon lights and annoying advertising lights swinging back and forth across his windows. Immersing him in a big volumetric projection would lose all this atmospheric stuff, and so I’d recommend keeping it either a small contained VP, like we saw in Minority Report, or just keep it a small flat screen.
OK, so we have an idea about how the display would (and shouldn’t) look, let’s move on to talk about the inputs.
To talk about inputs, then, we have to return to a favorite topic of mine, and that is the level of agency we want for the interaction. In short, we need to decide how much work the machine is doing. Is the machine just a manual tool that Deckard has to manipulate to get it to do anything? Or does it actively assist him? Or, lastly, can it even do the job while his attention is on something else—that is, can it act as an agent on his behalf? Sophisticated tools can be a blend of these modes, but for now, let’s look at them individually.
This is how the photo inspector works in Blade Runner. It can do things, but Deckard has to tell it exactly what to do. But we can still improve it in this mode.
We could give him well-mapped physical controls, like a remote control for this conceptual drone. Flight controls wind up being a recurring topic on this blog (and even came up already in the Blade Runner reviews with the Spinners) so I could go on about how best to do that, but I think that a handheld controller would ruin the feel of this scene, like Deckard was sitting down to play a video game rather than do off-hours detective work.
Similarly, we could talk about a gestural interface, using some of the synecdochic techniques we’ve seen before in Ghost in the Shell. But again, this would spoil the feel of the scene, having him look more like John Anderton in front of a tiny-TV version of Minority Report’s famous crime scrubber.
One of the things that gives this scene its emotional texture is that Deckard is drinking a glass of whiskey while doing his detective homework. It shows how low he feels. Throwing one back is clearly part of his evening routine, so much a habit that he does it despite being preoccupied about Leon’s case. How can we keep him on the couch, with his hand on the lead crystal whiskey glass, and still investigating the photo? Can he use it to investigate the photo?
Here I recommend a bit of ad-hoc tangible user interface. I first backworlded this for The Star Wars Holiday Special, but I think it could work here, too. Imagine that the photo inspector has a high-resolution camera on it, and the interface allows Deckard to declare any object that he wants as a control object. After the declaration, the camera tracks the object against a surface, using the changes to that object to control the virtual camera.
In the scene, Deckard can declare the whiskey glass as his control object, and the arm of his couch as the control surface. Of course the virtual space he’s in is bigger than the couch arm, but it could work like a mouse and a mousepad. He can just pick it up and set it back down again to extend motion.
This scheme takes into account all movement except vertical lift and drop. This could be a gesture or a spoken command (see below).
Going with this interaction model means Deckard can use the whiskey glass, allowing the scene to keep its texture and feel. He can still drink and get his detective on.
Indirect manipulation is helpful for when Deckard doesn’t know what he’s looking for. He can look around, and get close to things to inspect them. But when he knows what he’s looking for, he shouldn’t have to go find it. He should be able to just ask for it, and have the photo inspector show it to him. This requires that we presume some AI. And even though Blade Runner clearly includes General AI, let’s presume that that kind of AI has to be housed in a human-like replicant, and can’t be squeezed into this device. Instead, let’s just extend the capabilities of Narrow AI.
Some of this will be navigational and specific, “Zoom to that mirror in the background,” for instance, or, “Reset the orientation.” Some will more abstract and content-specific, e.g. “Head to the kitchen” or “Get close to that red thing.” If it had gaze detection, he could even indicate a location by looking at it. “Get close to that red thing there,” for example, while looking at the red thing. Given the 3D inferrer nature of this speculative device, he might also want to trace the provenance of an inference, as in, “How do we know this chair is here?” This implies natural language generation as well as understanding.
There’s nothing from stopping him using the same general commands heard in the movie, but I doubt anyone would want to use those when they have commands like this and the object-on-hand controller available.
Ideally Deckard would have some general search capabilities as well, to ask questions and test ideas. “Where were these things purchased?” or subsequently, “Is there video footage from the stores where he purchased them?” or even, “What does that look like to you?” (The correct answer would be, “Well that looks like the mirror from the Arnolfini portrait, Ridley…I mean…Rick*”) It can do pattern recognition and provide as much extra information as it has access to, just like Google Lens or IBM Watson image recognition does.
Finally, he should be able to ask after simple facts to see if the inspector knows or can find it. For example, “How many people are in the scene?”
All of this still requires that Deckard initiate the action, and we can augment it further with a little agentive thinking.
To think in terms of agents is to ask, “What can the system do for the user, but not requiring the user’s attention?” (I wrote a book about it if you want to know more.) Here, the AI should be working alongside Deckard. Not just building the inferences and cataloguing observations, but doing anomaly detection on the whole scene as it goes. Some of it is going to be pointless, like “Be aware the butter knife is from IKEA, while the rest of the flatware is Christofle Lagerfeld. Something’s not right, here.” But some of it Deckard will find useful. It would probably be up to Deckard to review summaries and decide which were worth further investigation.
It should also be able to help him with his goals. For example, the police had Zhora’s picture on file. (And her portrait even rotates in the dossier we see at the beginning, so it knows what she looks like in 3D for very sophisticated pattern matching.) The moment the agent—while it was reverse ray tracing the scene and reconstructing the inferred space—detects any faces, it should run the face through a most wanted list, and specifically Deckard’s case files. It shouldn’t wait for him to find it. That again poses some challenges to the script. How do we keep Deckard the hero when the tech can and should have found Zhora seconds after being shown the image? It’s a new challenge for writers, but it’s becoming increasingly important for believability.
Interior. Deckard’s apartment. Night.
Deckard grabs a bottle of whiskey, a glass, and the photo from Leon’s apartment. He sits on his couch and places the photo on the coffee table.
The machine on top of a cluttered end table comes to life.
Let’s look at this.
He points to the photo. A thin line of light sweeps across the image. The scanned image appears on the screen, pulled in a bit from the edges. A label reads, “Extending scene,” and we see wireframe representations of the apartment outside the frame begin to take shape. A small list of anomalies begins to appear to the left. Deckard pours a few fingers of whiskey into the glass. He takes a drink before putting the glass on the arm of his couch. Small projected graphics appear on the arm facing the inspector.
OK. Anyone hiding? Moving?
No and no.
Zoom to that arm and pin to the face.
He turns the glass on the couch arm counterclockwise, and the “drone” revolves around to show Leon’s face, with the shadowy parts rendered in blue.
What’s the confidence?
On the side of the screen the inspector overlays Leon’s police profile.
Deckard lifts his glass to take a drink. He moves from the couch to the floor to stare more intently and places his drink on the coffee table.
He turns the glass clockwise. The camera turns and he sees into a bedroom.
How do we have this much inference?
The convex mirror in the hall…
Wait. Is that a foot? You said no one was hiding.
The individual is not hiding. They appear to be sleeping.
Deckard rolls his eyes.
Zoom to the face and pin.
The view zooms to the face, but the camera is level with her chin, making it hard to make out the face. Deckard tips the glass forward and the camera rises up to focus on a blue, wireframed face.
That look like Zhora to you?
The inspector overlays her police file.
63% of it does.
Why didn’t you say so?
My threshold is set to 66%.
Give me a hard copy right there.
He raises his glass and finishes his drink.
This scene keeps the texture and tone of the original, and camps on the limitations of Narrow AI to let Deckard be the hero. And doesn’t have him programming a virtual Big Trak.
Another incidental interface is the pregnancy test that Joe finds in the garbage. We don’t see how the test is taken, which would be critical when considering its design. But we do see the results display in the orange light of Joe and Beth’s kitchen. It’s a cartoon baby with a rattle, swaying back and forth.
Sure it’s cute, but let’s note that the news of a pregnancy is not always good news. If the pregnancy is not welcome, the “Lucky you!” graphic is just going to rip her heart out. Much better is an unambiguous but neutral signal.
That said, Black Mirror is all about ripping our hearts out, so the cuteness of this interface is quite fitting to the world in which this appears. Narratively, it’s instantly recognizable as a pregnancy test, even to audience members who are unfamiliar with such products. It also sets up the following scene where Joe is super happy for the news, but Beth is upset that he’s seen it. So, while it’s awful for the real world; for the show, this is perfect.
After Joe confronts Beth and she calls for help, Joe is taken to a police station where in addition to the block, he now has a GPS-informed restraining order against him.
To confirm the order, Joe has to sign is name to a paper and then press his thumbprints into rectangles along the bottom. The design of the form is well done, with a clearly indicated spot for his signature, and large touch areas in which he might place his thumbs for his thumbprints to be read.
A scary thing in the interface is that the text of what he’s signing is still appearing while he’s providing his thumbprints. Of course the page could be on a loop that erases and redisplays the text repeatedly for emphasis. But, if it was really downloading and displaying it for the first time to draw his attention, then he has provided his signature and thumbprints too early. He doesn’t yet know what he’s signing.
Government agencies work like this all the time and citizens comply because they have no choice. But ideally, if he tried to sign or place his thumbprints before seeing all the text of what he’s signing, it would be better for the interface to reject his signature with a note that he needs to finish reading the text before he can confirm he has read and understands it. Otherwise, if the data shows that he authenticated it before the text appeared, I’d say he had a pretty good case to challenge the order in court.
Virtual Greta has a console to perform her slavery duties. Matt explains what this means right after she wakes up by asking her how she likes her toast. She answers, “Slightly underdone.”
He puts slices of bread in a toaster and instructs her, “Think about how you like it, and just press the button.”
She asks, incredulously, “Which one?” and he explains, “It doesn’t matter. You already know you’re making toast. The buttons are symbolic mostly, anyway.”
She cautiously approaches the console and touches a button in the lower left corner. In response, the toaster drops the carriage lever and begins toasting.
“See?” he asks, “This is your job now. You’re in charge of everything here. The temperature. The lighting. The time the alarm clock goes off in the morning. If there’s no food in the refrigerator, you’re in charge of ordering it.”Continue reading →
EYE-LINK is an interface used between a person at a desktop who uses support tools to help another person who is live “in the field” using Zed-Eyes. The working relationship between the two is very like Vika and Jack in Oblivion, or like the A.I. in Sight.
In this scene, we see EYE-LINK used by a pick-up artist, Matt, who acts as a remote “wingman” for pick-up student Harry. Matt has a group video chat interface open with paying customers eager to lurk, comment, and learn from the master.
Harry wears a hidden camera and microphone. This is the only tech he seems to have on him, only hearing his wingman’s voice, and only able to communicate back to his wingman by talking generally, talking about something he’s looking at, or using pre-arranged signals.
Tap your beer twice if this is more than a little creepy.
A smaller transparent information panel for automated analysis, research, and advice.
An extra, laptop-like screen where Matt leads a group video chat with a paying audience, who are watching and snarkily commenting on the wingman scenario. It seems likely that this is not an official part of the EYE-LINK software.
In the priorthreeposts, I’ve discussed the goods-and-bads of the Eye of Agamotto in the Tibet mode. (I thought I could squeeze the Hong Kong and the Dark Dimension modes into one post, but turns out this one was just too long. keep reading. You’ll see.) In this post we examine a mode that looks like the Tibet mode, but is actually quite different.
Hong Kong mode
Near the film’s climax, Strange uses the Eye to reverse Kaecilius’ destruction of the Hong Kong Sanctum Sanctorum (and much of the surrounding cityscape). In this scene, Kaecilius leaps at Strange, and Strange “freezes” Kaecilius in midair with the saucer. It’s done more quickly, but similarly to how he “freezes” the apple into a controlled-time mode in Tibet.
But then we see something different, and it complicates everything. As Strange twists the saucer counterclockwise, the cityscape around him—not just Kaecilius—begins to reverse slowly. (And unlike in Tibet, the saucer keeps spinning clockwise underneath his hand.) Then the rate of reversal accelerates, and even continues in its reversal after Strange drops his gesture and engages in a fight with Kaecilius, who somehow escapes the reversing time stream to join Strange and Mordo in the “observer” time stream.
So in this mode, the saucer is working much more like a shuttle wheel with no snap-back feature.
A shuttle wheel, as you’ll recall from the first post, doesn’t specify an absolute value along a range like a jog dial does. A shuttle wheel indicates a direction and rate of change. A little to the left is slow reverse. Far to the left is fast reverse. Nearly all of the shuttle wheels we use in the real world have snap-back features, because if you were just going to leave it reversing and pay attention to something else, you might as well use another control to get to the absolute beginning, like a jog dial. But, since Strange is scrubbing an endless “video stream,” (that is, time), and he can pull people and things out of the manipulated-stream and into the observer-stream and do stuff, not having a snap-back makes sense.
For the Tibet mode I argued for a chapter ring to provide some context and information about the range of values he’s scrubbing. So for shuttling along the past in the Hong Kong mode, I don’t think a chapter ring or content overview makes sense, but it would help to know the following.
The rate of change
Direction of change
Timedate difference from when he started
In the scene that information is kind of obvious from the environment, so I can see the argument for not having it. But if he was in some largely-unchanging environment, like a panic room or an underground cave or a Sanctum Sanctorum, knowing that information would save him from letting the shuttle go too far and finding himself in the Ordovician. A “home” button might also help to quickly recover from mistakes. Adding these signals would also help distinguish the two modes. They work differently, so they should look different. As it stands, they look identical.
He still (probably) needs future branches
Can Strange scrub the future this way? We don’t see it in the movie. But if so, we have many of the same questions as the Tibet mode future scrubber: Which timeline are we viewing & how probable is it? What other probabilities exist and how does he compare them? This argues for the addition of the future branches from that design.
Selecting the mode
So how does Strange specify the jog dial or shuttle wheel mode?
One cop-out answer is a mental command from Strange. It’s a cop-out because if the Eye responds to mental commands, this whole design exercise is moot, and we’re here to critique, practice, and learn. Not only that, but physical interfaces are more cinegenic, so better to make a concrete interaction for the film.
You might think we could modify the opening finger-tut (see the animated gif, below). But it turns out we need that for another reason: specifying the center and radius-of-effect.
Center and radius-of-effect
In Tibet, the Eye appears to affect just an apple and a tome. But since we see it affecting a whole area in Hong Kong, let’s presume the Eye affects time in a sphere. For the apple and tome, it was affecting a small sphere that included the table, too, it’s just that table didn’t change in the spans of time we see. So if it works in spheres, how is the center and the radius of the sphere set?
Let’s say the Eye does some simple gaze monitoring to find the salient object at his locus of attention. Then it can center the effect on the thing and automatically set the radius of effect to the thing’s size across likely-to-be scrubbed extents. In Tibet, it’s easy. Apple? Check. Tome? Check. In Hong Kong, he’s focusing on the Sanctum, and its image recognition is smart enough to understand the concept of “this building.”
But the Hong Kong radius stretches out beyond his line of sight, affecting something with a very vague visual and even conceptual definition, that is, “the wrecked neighborhood.” So auto-setting these variables wouldn’t work without reconceiving the Eye as a general artificial intelligence. That would have some massive repercussions throughout the diegesis, so let’s avoid that.
If it’s a manual control, how does he do it? Watch the animated gif above carefully and see he’s got two steps to the “turn Eye on” tut: opening the eye by making an eye shape, and after the aperture opens, spreading his hands apart, or kind of expanding the Eye. In Tibet that spreading motion is slow and close. In Hong it’s faster and farther. That’s enough evidence to say the spread*speed determines the radius. We run into the scales problem of apple-versus-neighborhood that we had in determining the time extents, but make it logarithmic and add some visual feedback and he should be able to pick arbitrary sizes with precision.
So…back to mode selection
So if we’re committing the “turn on” gesture to specifying the center-and-radius, the only other gesture left is the saucer creation. For a quick reminder, here’s how it works in Tibet.
Since the circle works pretty well for a jog dial, let’s leave this for Tibet mode. A contrasting but related gesture would be to have Strange hold his right hand flat, in a sagittal plane, with the palm facing to his left. (See an illustration, below.) Then he can tilt his hand inside the saucer to reverse or fast forward time, and withdraw his hand from the saucer graphic to leave time moving at the adjusted rate. Let the speed of the saucer indicate speed of change. To map to a clock, tilting to the left would reverse time, and tilting to the right would advance it.
The yank out
There’s one more function we see twice in the Hong Kong scene. Strange is able to pull Mordo and Wong from the reversing time stream by thrusting the saucer toward them. This is a goofy choice of a gesture that makes no semantic sense. It would make much more sense for Strange to keep his saucer hand extended, and use his left hand to pull them from the reversing stream.
So one of the nice things about this movie interface, is that while it doesn’t hold up under the close scrutiny of this blog, the interface to the Eye of Agamotto works while watching the film. Audience sees the apple happen, and gets that gestures + glowing green circle = adjusting time. For that, it works.
That said, we can see improvements that would not affect the script, would not require much more of the actors, and not add too much to post. It could be more consistent and believable.
But we’re not done yet. There’s one other function shown by the Eye of Agamotto when Strange takes it into the Dark Dimension, which is the final mode of the Eye, up next.