8 Reasons The Voight-Kampff Machine is shit (and a redesign to fix it)

10 Dec 2019 by Christopher Noessel

Distinguishing replicants from humans is a tricky business. Since they are indistinguishable biologically, it requires an empathy test, during which the subject hears empathy-eliciting scenarios and watched carefully for telltale signs such as, “capillary dilation—the so-called blush response…fluctuation of the pupil…involuntary dilation of the iris.” To aid the blade runner in this examination, they use a portable machine called the Voight-Kampff machine, named, presumably, for its inventors.

The device is the size of a thick laptop computer, and rests flat on the table between the blade runner and subject. When the blade runner prepares the machine for the test, they turn it on, and a small adjustable armature rises from the machine, the end of which is an intricate piece of hardware, housing a powerful camera, glowing red.

The blade runner trains this camera on one of the subject’s eyes. Then, while reading from the playbook book of scenarios, they keep watch on a large monitor, which shows an magnified image of the subject’s eye. (Ostensibly, anyway. More on this below.) A small bellows on the subject’s side of the machine raises and lowers. On the blade runner’s side of the machine, a row of lights reflect the volume of the subject’s speech. Three square, white buttons sit to the right of the main monitor. In Leon’s test we see Holden press the leftmost of the three, and the iris in the monitor becomes brighter, illuminated from some unseen light source. The purpose of the other two square buttons is unknown. Two smaller monochrome monitors sit to the left of the main monitor, showing moving but otherwise inscrutable forms of information.

In theory, the system allows the blade runner to more easily watch for the minute telltale changes in the eye and blush response, while keeping a comfortable social distance from the subject. Substandard responses reveal a lack of empathy and thereby a high probability that the subject is a replicant. Simple! But on review, it’s shit. I know this is going to upset fans, so let me enumerate the reasons, and then propose a better solution.

-2. Wouldn’t a genetic test make more sense?

If the replicants are genetically engineered for short lives, wouldn’t a genetic test make more sense? Take a drop of blood and look for markers of incredibly short telomeres or something.

-1. Wouldn’t an fMRI make more sense?

An fMRI would reveal empathic responses in the inferior frontal gyrus, or cognitive responses in the ventromedial prefrontal gyrus. (The brain structures responsible for these responses.) Certinaly more expensive, but more certain.

0. Wouldn’t a metal detector make more sense?

If you are testing employees to detect which ones are the murdery ones and which ones aren’t, you might want to test whether they are bringing a tool of murder with them. Because once they’re found out, they might want to murder you. This scene should be rewritten such that Leon leaps across the desk and strangles Holden, IMHO. It would make him, and other blade runners, seem much more feral and unpredictable.

(OK, those aren’t interface issues but seriously wtf. Onward.)

1. Labels, people

Controls needs labels. Especially when the buttons have no natural affordance and the costs of experimentation to discover the function are high. Remembering the functions of unlabeled controls adds to the cognitive load for a user who should be focusing on the person across the table. At least an illuminated button helps signal the state, so that, at least, is something.

2. It should be less intimidating

The physical design is quite intimidating: The way it puts a barrier in between the blade runner and subject. The fact that all the displays point away from the subject. The weird intricacy of the camera, its ominous HAL-like red glow. Regular readers may note that the eyepiece is red-on-black and pointy. That is to say, it is aposematic. That is to say, it looks evil. That is to say, intimidating.

I’m no emotion-scientist, but I’m pretty sure that if you’re testing for empathy, you don’t want to complicate things by introducing intimidation into the equation. Yes, yes, yes, the machine works by making the subject feel like they have to defend themselves from the accusations in the ethical dilemmas, but that stress should come from the content, not the machine.

2a. Holden should be less intimidating and not tip his hand

While we’re on this point, let me add that Holden should be less intimidating, too. When Holden tells Leon that a tortoise and a turtle are the same thing, (Narrator: They aren’t) he happens to glance down at the machine. At that moment, Leon says, “I’ve never seen a turtle,” a light shines on the pupil and the iris contracts. Holden sees this and then gets all “ok, replicant” and becomes hostile toward Leon.

In case it needs saying: If you are trying to tell whether the person across from you is a murderous replicant, and you suddenly think the answer is yes, you do not tip your hand and let them know what you know. Because they will no longer have a reason to hide their murderyness. Because they will murder you, and then escape, to murder again. That’s like, blade runner 101, HOLDEN.

3. It should display history

The glance moment points out another flaw in the interface. Holden happens to be looking down at the machine at that moment. If he wasn’t paying attention, he would have missed the signal. The machine needs to display the interview over time, and draw his attention to troublesome moments. That way, when his attention returns to the machine, he can see that something important happened, even if it’s not happening now, and tell at a glance what the thing was.

4. It should track the subject’s eyes

Holden asks Leon to stay very still. But people are bound to involuntarily move as their attention drifts to the content of the empathy dilemmas. Are we going to add noncompliance-guilt to the list of emotional complications? Use visual recognition algorithms and high-resolution cameras to just track the subject’s eyes no matter how they shift in their seat.

5. Really? A bellows?

The bellows doesn’t make much sense either. I don’t believe it could, at the distance it sits from the subject, help detect “capillary dilation” or “ophthalmological measurements”. But it’s certainly creepy and Terry Gilliam-esque. It adds to the pointless intimidation.

6. It should show the actual subject’s eye

The eye color that appears on the monitor (hazel) matches neither Leon’s (a striking blue) or Rachel’s (a rich brown). Hat tip to Typeset in the Future for this observation. His is a great review.

7. It should visualize things in ways that make it easy to detect differences in key measurements

Even if the inky, dancing black blob is meant to convey some sort of information, the shape is too organic for anyone to make meaningful readings from it. Like seriously, what is this meant to convey?

The spectrograph to the left looks a little more convincing, but it still requires the blade runner to do all the work of recognizing when things are out of expected ranges.

8. The machine should, you know, help them

The machine asks its blade runner to do a lot of work to use it. This is visual work and memory work and even work estimating when things are out of norms. But this is all something the machine could help them with. Fortunately, this is a tractable problem, using the mighty powers of logic and design.

Pupillary diameter

People are notoriously bad at estimating the sizes of things by sight. Computers, however, are good at it. Help the blade runner by providing a measurement of the thing they are watching for: pupillary diameter. (n.b. The script speaks of both iris constriction and pupillary diameter, but these are the same thing.) Keep it convincing and looking cool by having this be an overlay on the live video of the subject’s eye.

So now there’s some precision to work with. But as noted above, we don’t want to burden the user’s memory with having to remember stuff, and we don’t want them to just be glued to the screen, hoping they don’t miss something important. People are terrible at vigilance tasks. Computers are great at them. The machine should track and display the information from the whole session.

Note that the display illustrates radius, but displays diameter. That buys some efficiencies in the final interface.

Now, with the data-over-time, the user can glance to see what’s been happening and a precise comparison of that measurement over time. But, tracking in detail, we quickly run out of screen real estate. So let’s break the display into increments with differing scales.

There may be more useful increments, but microseconds and seconds feel pretty convincing, with the leftmost column compressing gradually over time to show everything from the beginning of the interview. Now the user has a whole picture to look at. But this still burdens them into noticing when these measurements are out of normal human ranges. So, let’s plot the threshold, and note when measurements fall outside of that. In this case, it feels right that replicants display less that normal pupillary dilation, so it’s a lower-boundary threshold. The interface should highlight when the measurement dips below this.

Blush

I think that covers everything for the pupillary diameter. The other measurement mentioned in the dialogue is capillary dilation of the face, or the “so-called blush response.” As we did for pupillary diameter, let’s also show a measurement of the subject’s skin temperature over time as a line chart. (You might think skin color is a more natural measurement, but for replicants with a darker skin tone than our two pasty examples Leon and Rachel, temperature via infrared is a more reliable metric.) For visual interest, let’s show thumbnails from the video. We can augment the image with degree-of-blush. Reduce the image to high contrast grayscale, use visual recognition to isolate the face, and then provide an overlay to the face that illustrates the degree of blush.

But again, we’re not just looking for blush changes. No, we’re looking for blush compared to human norms for the test. It would look different if we were looking for more blushing in our subject than humans, but since the replicants are less empathetic than humans, we would want to compare and highlight measurements below a threshold. In the thumbnails, the background can be colored to show the median for expected norms, to make comparisons to the face easy. (Shown in the drawing to the right, below.) If the face looks too pale compared to the norm, that’s an indication that we might be looking at a replicant. Or a psychopath.

So now we have solid displays that help the blade runner detect pupillary diameter and blush over time. But it’s not that any diameter changes or blushing is bad. The idea is to detect whether the subject has less of a reaction than norms to what the blade runner is saying. The display should be annotating what the blade runner has said at each moment in time. And since human psychology is a complex thing, it should also track video of the blade runner’s expressions as well, since, as we see above, not all blade runners are able to maintain a poker face. HOLDEN.

Anyway, we can use the same thumbnail display of the face, without augmentation. Below that we can display the waveform (because they look cool), and speech-to-text the words that are being spoken. To ensure that the blade runner’s administration of the text is not unduly influencing the results, let’s add an overlay to the ideal intonation targets. Despite evidence in the film, let’s presume Holden is a trained professional, and he does not stray from those targets, so let’s skip designing the highlight and recourse-for-infraction for now.

Finally, since they’re working from a structured script, we can provide a “chapter” marker at the bottom for easy reference later.

Now we can put it all together, and it looks like this. One last thing we can do to help the blade runner is to highlight when all the signals indicate replicant-ness at once. This signal can’t be too much, or replicants being tested would know from the light on the blade runner’s face when their jig is up, and try to flee. Or murder. HOLDEN.

For this comp, I added a gray overlay to the column where pupillary and blush responses both indicated trouble. A visual designer would find some more elegant treatment.

If we were redesigning this from scratch, we could specify a wide display to accomodate this width. But if we are trying to squeeze this display into the existing prop from the movie, here’s how we could do it.

Note the added labels for the white squares. I picked some labels that would make sense in the context. “Calibrate” and “record” should be obvious. The idea behind “mark” is an easy button for the blade runner to press when they see something that looks weird, like when doctors manually annotate cardiograph output.

Lying to Leon

There’s one more thing we can add to the machine that would help out, and that’s a display for the subject. Recall the machine is meant to test for replicant-ness, which happens to equate to murdery-ness. A positive result from the machine needs to be handled carefully so what happens to Holden in the movie doesn’t happen. I mentioned making the positive-overlay subtle above, but we can also make a placebo display on the subject’s side of the interface.

The visual hierarchy of this should make the subject feel like its purpose is to help them, but the real purpose is to make them think that everything’s fine. Given the script, I’d say a teleprompt of the empathy dilemma should take up the majority of this display. Oh, they think, this is to help me understand what’s being said, like a closed caption. Below the teleprompt, at a much smaller scale, a bar at the bottom is the real point.

On the left of this bar, a live waveform of the audio in the room helps the subject know that the machine is testing things live. In the middle, we can put one of those bouncy fuiget displays that clutters so many sci-fi interfaces. It’s there to be inscrutable, but convince the subject that the machine is really sophisticated. (Hey, a diegetic fuiget!) Lastly—and this is the important part—An area shows that everything is “within range.” This tells the subject that they can be at ease. This is good for the human subject, because they know they’re innocent. And if it’s a replicant subject, this false comfort protects the blade runner from sudden murder. This test might flicker or change occasionally to something ambiguous like “at range,” to convey that it is responding to real world input, but it would never change to something incriminating.

This way, once the blade runner has the data to confirm that the subject is a replicant, they can continue to the end of the module as if everything was normal, thank the replicant for their time, and let them leave the room believing they passed the test. Then the results can be sent to the precinct and authorizations returned so retirement can be planned with the added benefit of the element of surprise.

OK

Look, I’m sad about this, too. The Voight-Kampff machine is cool. It fits very well within the art direction of the Blade Runner universe. This coolness burned the machine into my memory when I saw this film the first dozen times, but despite that, it just doesn’t stand up to inspection. It’s not hopeless, but does need a lot of thinkwork and design to make it really fit to task, and convincing to us in the audience.

IQ Testing

1 Nov 2018 by Christopher Noessel

When Joe is processed after his arrest, he is taken to a general IQ testing facility. He sits in a chair wearing headphones. A recorded voice asks, “If you have one bucket that holds two gallons, and another bucket that holds five gallons, how many buckets do you have?” Into a microphone he says, incredulous that this is a question, “Two?” The recorded voice says, “Thank you!”

Joe looks to his left to see another subject is trying to put a square blue peg into the middle round hole of a panel and of course failing. Joe looks to his right, to see another subject with a triangular green peg in hand that he’s trying to put into the round middle hole in his interface. Small colored bulbs above each hole are unlit, but they match the colors of the matching blocks, so let’s presume they illuminate when the correct peg is inserted. When you look closely, it’s also apparent that the blocks are tethered to the panel so they’re not lost, and each peg is tethered directly below its matching hole. So there are lots and lots of cues that would let a subject figure it out. And yet, they are not. The subject to Joe’s right even eyes Joe suspiciously and turns his body to cover his test so Joe won’t try and crib…uh…“answers.”

Comedy

The comedy in the scene comes from how rudimentary these challenges are. Most toddlers could complete the shape test. Even if you couldn’t figure out the shapes, you could match the colors, i.e. the blue object goes in the hole under the blue bulb. Most preschoolers could answer the spoken challenge. It underscores the stupidity of this world that generalized IQ tests for adults test below grade school levels.

IQ Testing

Since Binet invented the first one in 1904, IQ testing has a long, and problematic past (racism and using it to justify eugenic arguments, just for instance) but it can have a rational goal: How do we measure the intelligence of a set of people (students in a classroom, or applicants to intelligence jobs) for strategic decisions about aptitude, assistance, and improvement? But intelligence is a very slippery concept, and complicated to study much less test. The good news in this case is that the citizens of Idiocracy don’t have very sophisticated intellects, so very basic tests of intelligence should suffice.

Some nice things

So, that said, the shape test has some nice aspects. The panel is angled so the holes are visible and targetable, without being so vertical it’s easy to drop the pegs while manipulating them. The panel is plenty thick for durability and cleaning. The speech-to-text tech seems to work perfectly, unlike the errors and bad design that riddle most technologies in Idiocracy.

A garden path match

There’s an interesting question of affordances in the device. You can see in the image above that the yellow round block fits just fine in the square hole. Ordinarily, a designer would want to prevent errors like this by, say, increasing the diameter of the round peg (and its hole) so that it couldn’t be inserted into the square hole. That version of the test would just test the time it took by even trial-and-error to match pegs to their matching holes, then you could rank subjects by time-to-completion. But by allowing the round peg to fit in the square hole, you complicate the test with a “garden path” branch where some subjects can get lost in what he thinks is a successful subtask. This makes it harder to compare subjects fairly, because another subject might not have wandered down this path and paid an unfair price in their time-to-complete.

Another complication is that this test has so many different clues. Do they notice the tethers? Do subjects notice the colored bulbs? (What about color blind subjects?) Having it test cognitive skills as well as fine-motor manipulation skills as well as perception skills seems quite complicated and less likely to enable fair comparisons.

We must always scrutinize IQ tests because people put so much stock in them and it can be very much to an individual’s detriment. Designers of these tests ought to instrument them carefully for passive and active feedback about when the test itself is proving to be problematic.

Challenging the “superintelligent?”

A larger failing of the test is that it doesn’t challenge Joe at all. All his results would tell him is that he’s much much more intelligent than these tests are built for. Fair enough, there’s nothing in the world of Idiocracy which would indicate a need to test for superintelligence among the population, but this test had to be built by someone(s), generations ago. Could they not even have the test work on someone as smart as themselves? That’s all it would need to test Joe. But we live in a world that should be quite cautious about the emergence of a superintelligence. It would be comforting to imagine that we could test for that. Maybe we should include the Millennium Problems at the end of every test. Just in case.

Another Idiot Test

As “luck” would have it, Trump tweeted an IQ test just this morning. (I don’t want to link to it to directly add any fuel to his fire, but you can Google it easily.) It’s an outrageous political video ad. As you watch it:

Do you believe that a single anecdote about a troubled, psychotic individual is generalizable to everyone with brown skin? Or even to everyone with brown skin who is not American and seeking legal asylum in the U.S.?
Do you ignore the evidence of the past decades (and the last week) that show it’s conservative white males who are much more of a problem? (Noting that vox is a liberal-leaning publication, but look at the article’s citations.)
Can you tell that the war drums under the ad are there only to make you feel scared, appealing to your emotions with cinematic tricks?
Do you uncritically fall for implicature and the slippery slope fallacy?

If the answers to all these are yes, well, sorry. You’ve failed an IQ test put to you by one of the most blatantly racist political ads since WIllie Horton. (Not many ads warrant a deathbed statement of regret, but that one did.) Maybe it’s best you take the rest of the week off treating yourself. Leave town. Take a road trip somewhere. Eat some ice cream.

For the rest of you, congratulations on passing the test. We have 5 days until the election. Kick the racist bastards and the bastards enabling the racist bastards out.

Grade Board

2 Apr 2014 by Christopher Noessel

When students want to know the results of their tests, they do so by a public interface. A large, tiled screen is mounted to a recessed section of wall in a courtyard. The display is divided into a grid of five columns and three rows. Each cell contains one student’s results for one test, as a percentage. One cell displays an ad for military service. Another provides a reminder for the upcoming sports game. Four keyboards are situated below the screens at waist level.

To find her score, Carmen approaches one of the keyboards and enters some identifying data. In response, the column above the screen displays her score and moves the data in the other cells up. There is no way to learn of one’s test scores privately. This hits Johnny particularly hard when he checks his scores to find he has earned 35% on his Math Final, a failing grade.

Worse, his friend Carl is able to walk up to the keyboard and with a few key presses, interrupt every other student looking at the grades, and fill the entire screen with Johnny’s score for all to see, with the failing number blinking red and white, ridiculing him before his peers. After a reprimand from Johnny, Carl returns the display to normal with the press of a button.

Is ANSI the right input?

The keyboard would be a pain to keep clean, and you’d figure that a student ID would be a unique-and-memorable enough token. Does an entire ANSI keyboard need to be there? Wouldn’t a number pad be enough? But why a manual input at all? Nowadays you’d expect some near-field communication, or biometric token, which would obviate the keyboard entirely.

Are publicizing grades OK?

So there are input and interaction improvements to be made, for sure. But there’s more important issues to talk about here. Yes, students can accomplish one task with the interface well enough: Checking grades. But what about the giant, public output?

It’s fullfilling one of the dystopian goals of the fascist society in which the story takes place, which is that might makes right. Carl is a bully (even if Jonny’s friend) and in the culture of Starship Troopers, if he wants to increase Johnny’s public humiliation, why not? Johnny needs to study harder, take it on the chin, or make Carl stop. In this regard, the interface satisfies both the students’ task and the culture’s…um…values.

I originally wanted to counter that with a strong statement that, “But that’s not us.” After all, modern federal privacy laws in the United States forbid this public display as a violation of students’ privacy. (See FERPA laws.) But apparently not everyone believes this. A look on debate.org (at the time of writing) shows that opinion is perfectly split on the topic. I could lay out my thoughts on which side is better for learning, but it’s really beyond the scope of this blog to build a case for either side of Lakoff’s Moral Politics.

You’re Doing More Than You Think You’re Doing

But it’s worth noting the scope of these issues at hand. This seems at first to be an interface just about checking grades, but when you look at the ecosystem in which it operates, it actually illustrates and reinforce a culture’s core virtues. The interface is sometimes not just the interface. Its designers are more than flowchart monkeys.

Sci-fi interfaces

Stop watching sci-fi. Start using it.

Tag Archives: test

Grade Board

Is ANSI the right input?

Are publicizing grades OK?

You’re Doing More Than You Think You’re Doing