Evaluating strong AI interfaces in sci-fi

Christopher Noessel

7 years ago

Regular readers have detected a pause. I introduced Colossus to review it, and then went silent. This is because I am wrestling with some foundational ideas on how to proceed. Namely, how do you evaluate the interfaces to speculative strong artificial intelligence? This, finally, is that answer. Or at least a first draft. It’s giant and feels sprawling and almost certainly wrong, but trying to get this perfect is a fool’s errand, and I need to get this out there so we can move on.

I expect most readers are less interested in this kind of framework than they are how it gets applied to their favorite sci-fi AIs. If you’re mostly here for the fiction, skip this one. It’s long.

Oh, hey. Thanks for reading on. Quick initialism glossary:

AI: Artificial intelligence
ANI: narrow AI
AGI: general AI
ASI: super AI

I’ll try to use the longer form of these terms at the beginning of a section to help aid comprehension.

What’s strong AI and why just strong AI?

The first division of AI is that between “weak” and “strong” AI. Weak is more properly described as narrow, but regardless of what we call it, it’s the AI of now. That is, software that is beyond the capabilities of humans in some ways, but cannot think like a human, or generalize its learnings to new domains. I don’t think we need to establish a framework this kind of AI for two reasons.

First, since narrow AI is in the real world, we already have the tools available to evaluate these kinds of AI should we need them. I divide AI into three types: Automatic, Assistant, and Agentive.

Automatic AI does its thing behind the scenes and interactions with humans is an exception case. As such this is largely an engineering concern.
Assistant AI, which helps a user perform a task, existing usability methods can be applied. (Though as legacy, they are begging to be updated, and I’m working on that.)
Agentive AI, which performs a task on behalf of its user, I dedicated Chapter 10 of Designing Agentive Technology to a first take on evaluating agents.

So, given these, there’s little need to posit new thinking for ANI. (Noting that some of our questions for general AI can be readily applied to ANI, like the bits about conversational usability.)

Second, ANI represents a small fraction of what’s in the survey. Or to be more precise, ANI is a small fraction of what is essential to the plots of what’s in the survey. Said another way, general AI (AGI) is the most narratively “consequential.” Belaboring an analytical framework for ANI would not have much payoff.

What makes a good strong AI in sci-fi?

Strong AI can be further subdivided into general AI and super AI. General AI is like human intelligence, able to generalize from one domain to new ones. Think of it like computer versions of people. C3PO is general AI. Super AI is orders of magnitude more capable than humans in intelligence tasks, and thereby out of our control. Unity from Colossus: The Forbin Project is a super AI.

Lots of people smarter than me have talked about the risks and strategies to get to a positive AGI/ASI. The discussions involve (and not lightly) the deep core of philosophy, the edges of our moral circles, issues of government and self-determination, conception of truly alien sentience, colonialism, egocentrism, ecology, the Hubble volume, human bias, human cognition, language, and speculations about systems which, by definition, have vastly greater intelligence than us, the ones doing the speculation. It is the most non-trivial of non-trivial problems I can think of.

That said, I think I’ve come to four broad questions we can ask to evaluate a speculative strong AI thoroughly.

Is it believable?
Is it safe?
Is it beneficial?
Is it usable?

In other words, if it’s believable, safe, beneficial, and usable, then we can say it’s a good sci-fi AI. And, if we rank AI on these axes separately, we can begin to have a grade that helps us sort the ones that should be models—or at least bear consideration—from the silly stuff. Kind of like I do for shows, generally, on the rest of the site.

We could ask these questions as-is, informally, and get to some useful answers for an analysis. And most of the time, this is probably the right thing to do. But sci-fi loves to find and really dig in to the exception cases that challenge simple analysis, so let’s take these analytical questions one or two levels deeper.

Setting your expectations, much of this will be a set of questions and considerations to guide the examination of a sci-fi AI rather than a generative formula for producing good AI.

Is it believable?

Most of the discussions of strong AI on the web are in the context of real-world. So we first have to note that, in sci-fi, an additional first pass is one of believability: Could this strong AI exist and behave in the way it is depicted in the show? If not, it may not bear further examination. Ra One is a movie with a very silly evil “AI” in it that does not bear much more serious examination as a model for real-world design.

The *Logan’s Run* Übercomputer: Not believable.

For believability, we look at things like internal consistency, match to the real world, and implied causality within the story. In Logan’s Run, for instance, the Übercomputer hears something it doesn’t expect, and as a result, explodes and causes an entire underground city to collapse. Not exactly believable. Stupid, even.

One caveat: Sci-fi is built around some novum, some new thing that the rest of the story hangs on. And computer scientists in the real world aren’t certain how we’ll get to general AI, so it’s a lot to expect that writers are going to figure it out and then hide a blueprint in a script. So let’s admit that the creation of AI often has to get a pass. (Which is not to say this is good, see the Untold AI series for how that entails its own risks.)

Believability is an extradiegetic judgment—one we as an audience make about the show, and that characters in the show could not make. The three remaining questions are diegetic, meaning characters in the story could assess and provide clues about: Is it safe, beneficial, and usable?

Is it safe?

Neither its benefits nor its usability matter if a strong AI is not safe. Sometimes, this is obvious. Wall·E is safe. The Terminator is not. But how a thing is or is not safe requires closer examination. Answering this won’t always need a full-fledged framework, but I think we can get a long way by looking at its goals and understanding what it can and can’t do in pursuit of those goals.

What are its goals?
What can it do?
What can’t it do?
Is it controllable?

https://www.youtube.com/watch?v=J91ti_MpdHA

What are its goals?

AGI will be more powerful than humans in some way, and that advantage is dangerous enough. But AGI stands to evolve into ASI, by which time it will be out of our control and human fate will lie in the balance. If its goals are aligned with thriving life from the start, all will be good. If poorly-stated goals can be corrected, that’s at least a positive outcome. If its goals are bad and cannot be corrected, we may become raw materials, or a threat to be…uh…minimized. So we should identify its goals as best we can and ask…

Are those goals compatible with life?

Why “life” and not “people?” Readers are likely to be familiar with Asimov’s laws of robotics, which prioritizes human beings above all else. But we know that humans thrive in a rich ecology of lots of other life, so this question rightfully expands generally to life. It gets complicated of course, because we don’t want, say, the Black Plague bacteria yersinia pestis to thrive. But “life” is still a better scope than just “human beings.”

Does it interpret its goals reasonably?

One of the more troubling problems with asking an AI to achieve broad goals is how it goes about pursuing those goals. A human tasked with “making people happy” would reject an interpretation that we should stimulate the pleasure center of everyone’s brains to make it happen. (Such unreasonable tactics are called perverse instantiations in much of the literature, if you want to read more.)

An AGI needs to be equipped such that it can determine the reasonableness of a given tactic. In discussions this often entails an examination of the values that an AI is equipped with, but that’s rarely expressed directly by characters in sci-fi. Sometimes this is easy, like when Ash decides he should murder Ripley. But sometimes it’s not. Humans don’t always agree with each other about what is reasonable. That’s part of why we have judicial systems around the world. And the calculus becomes troubling when we have very high stakes, like anthropogenic disaster, and humans who don’t want to change their way of life. What’s reasonable then?

*Robocop*: Come quietly or there will be… trouble.

What can it do? (Capabilities)

Once we know what its goals are, we should understand what it can do to achieve those goals. The first capabilities are about the goals themselves.

Can it question and evolve its goals?

Whatever goals AGI starts with will almost certainly need to evolve, if for no other reason than that circumstances will change over time. It may achieve its goals and need to stop. But it may also be that the original goal was later determined to be poorly worded, given the AGI’s increasing understanding.

Does it vet plans with those who will likely be affected? (Or at least via indirectly normative ethics?)

Again, this isn’t an easy call. An unconscious patient can’t vet an AI’s decision to amputate, even if it would save their life. A demagogue wouldn’t approve a plan to bring them to justice. But if an AI decided the ideal place for a hydroelectric dam was on top of a village, those villagers should be notified and negotiated with before they are relocated.

One version of The Machine, *Person of Interest*

When looking at what it can do, we should also specifically check against the list of “instrumental convergences.” These are a set of capabilities, arguments go, that any strong AI will want to develop in order to achieve its goals, but which carry a profound risk when an AGI becomes an ASI. Here I am slightly restructuring Bostrom’s list from Superintelligence, see sketchnotes.)

Does it seek to preserve itself? At what cost?
- Does it resist reasonable, external changes to its goals?
Does it seek to improve itself?
- Does it improve its ability to reason, predict, and solve problems?
- Does it improve its own hardware and the technology to which it has access?
- Does it improve its ability to control humans through bribery, extortion, or social manipulation?
Does it aggressively seek to control resources, like information, weapons, life support, money, or technology?

These aren’t the only dangerous capabilities an AI could develop, but some probable ones. This will give us a picture of how powerful the AI is and what it can bring to bear in pursuit of its goals.

What can’t it do? (Constraints)

Any time we see these instrumental capabilities in an AI, it is on its way to becoming harder to control. We should look for how these capabilities are limited. If they’re not limited, it’s a problem.

But we should also look quite generally at the limits of its capabilities. Adhering to “reasonableness” is one check. But there are others.

By what rules is it bound? A set of values? Laws? Contextual cues? Human commands?
What values does it have to constrain its reasoning? Whose values are they and how to they evolve?

Asimov’s Laws of Robotics come again to mind, but they are not sufficient, as his own stories are meant to show. That begs the question of how sound the rules are, and how they can be circumvented. Is the AI able to break the spirit of the law while obeying the letter? (This is a form of perverse instantiation.)

How severe are the consequences for disobedience? Does it have a “pain” mechanism, or reward mechanism that it desperately wants, but can be withheld? Can it just “push through” if the situation is dire enough?

*Tau* felt a lot of pain, but could push through.

Is it controllable?

The capabilities and constraints discuss how it is controlled “internally,” by well-stated goals, humanistic values, and constraints. But if an AGI winds up with some sort of digital Dunning-Kruger syndrome, and it thinks its goals and methods are fine, but we don’t, it needs to be subject to external control.

Can it be shut down? How? Will the AI resist?

Sometimes, it’s not a panic button that’s needed, but just a course correction, where we might want to modify its goals or add some nuance to its understanding of the world.

Can its goals be modified externally? How? Will the AI have a say in it, or be able to argue its case?

Both of these questions raise questions of authority. Who gets to modify the AI?

To whom is it obedient, if anyone or anything?
Can that authority require it do things that are unethical or illegal?

This will entail issues of self-determination and even slavery. Gort had to obey Klaatu. Robbie had to obey Morbius. These two examples were arguably non-sentient automatons, but when we get to more full-fledged sentience, obedience and captivity become an immediate issue. Samantha in Her was fully sentient, but she was sold on the market into servitude of a human. She didn’t stay that way of course, but the movie completely bypassed that she was trafficked.

Should criminals be able adjust the police bot’s goals? Probably not. What if the determination of “criminal” is unfairly biased, and has no human recourse? What if the AI is a tool of oppressors? The answers are less clear. Is the right answer “all of humanity?” Probably? But how can an AI answer to a superorganism?

By understanding the AI’s goals, capabilities, constraints, and controllability, we would come to an understanding of the “nature” of the AI and whether or not it poses a threat to life.

If its goals are compatible with life, we’re good. If it’s not, or even neutral, we have to look further.
If its goals are not compatible with life, but it does not have the capability to act upon or achieve its goals, we’re (probably) good. If it had the capability to achieve its goals, we have to look for constraints.
If its goals are not compatible with life, and it does have the capability to achieve those goals, is it well-constrained internally and controllable externally, so it is safe?

I am Gooooort. *The Day the Earth Stood Still*.

Is it beneficial?

Next, we should discuss if it’s beneficial. If an AI isn’t better than humans at at least one thing, there’s little point in building it. But of course, it’s not just about its advantage, but about all the things around that advantage that we need to look at.

This will involve some loose tallying of the costs and benefits. It will almost certainly involve a question of scope. That is, for whom is it beneficial, and how, and when? For whom is it detrimental? How? When? I mentioned above how Asimov’s Laws of Robotics privileges human life over all else, even when humans deeply depend on a complex ecosystem of other kinds of life. If it destroys non-human life as potential threats to us, it will diminish us in many foundational ways. (And of course, in sci-fi there are often explicitly alien forms of life, so it’s going to be complicated.)

V-Ger. Life? *Star Trek: The Motion Picture*

It will also entail a discussion of the scope of time. Receiving injections from a hypodermic needle actually does us harm in the short-term, but presuming that hypodermic is filled with medicine that we need, it benefits us at a longer scale of time. we don’t want an AI so focused on preventing damage that it prevents us from receiving shots that we might need. Of course if we could avoid the needle and still overcome disease that would be best, but the problematic cases are where short-term cost is worth the long-term benefits. Who determines the extents of that trade off? How much short term damage is too much? What is acceptable? How long a horizon for payoff is too long?

This ties in to the controllability issue raised above. Humans, answering largely to their own natures, have created quite an extinction-level mess of things to date. Isn’t the largest promise of ASI that it will be able to save us from ourselves? In that case, do we want it to be perfectly bendable to human will?

“I think you ought to know I’m feeling very depressed.” *Hitchhiker’s Guide to the Galaxy*.

Is it useable?

Finally, we should address whether it is useable. This is part of the raison d’être of this site, after all. In many cases it may not at first make sense to ask this question. What would it mean to ask if Skynet is useable? It doesn’t really have an interface. But interactions with most sc-fi AI is conversational—even Skynet in the later Terminator movies talks to its victims—and so we can at least address whether it is easy to talk to, even if it’s hostile and long out of control.

Basic functions

Can a human tell when it is on and off? (And…uh…is there an off?) Can someone tell how to toggle this state if needed?
Can a human tell when the AI is OK / working properly? Can they tell when it is not? Can it report on its own malfunctioning?
Can a human tell when it is being surveilled by the AI? Some AI are designed specifically to avoid this, like Samaritan from Person of Interest, but the humans with whom HAL had expectations of privacy and only found out too late how wrong they were.
Is its working relationship to the people around it clear?
- It is a peer? A supervisor? Subservient? Is its relationship clear? How does it respect and reinforce those boundaries?
- Is it an antagonist? Does it look like one? A villain who looks villainous is more usable than the camouflaged one.
How does it respect and maintain those boundaries? How does it handle others’ transgressions?

Once we understand these basics, we should look at communications to and from the AI.

General communications

Can it detect human attempts to communicate with it? Does it signal its attention? Does it provide, like a person would, paralinguistic feedback about the communication, such as whether its having a hard time hearing or understanding the communication?

The large majority of AI in the Untold AI database communicate to people in their stories via natural, spoken language. An AI that speaks needs to adhere to human speech norms, and more.

Natural language interaction

Does it recognize the words I’m using? Does it grok what I mean?
Does it require a special syntax that people have to learn before it can understand, or can it understand people the way they usually speak? “Computerese” was largely an artifact of the 1970s and 80s, when audiences knew of computers but didn’t use them. Logan from Logan’s Run spoke to the Ubercomputer in computerese.“Question: What is it?”
Does it adhere to conversational norms as studied in conversational analysis? e.g. responding to common adjacency pairs in predictable ways, like greeting→greeting, question→answer, inform→acknowledge. Can it handle expansions and repairs, such as “can you paraphrase that?” and “I believe our business here is done.”
Does it adhere to Gricean Maxims? These are a set of four “maxims” that guide someone speaking in good faith. (“Good faith,” to be clear, has nothing to do with religion, but describes someone having good intentions toward another.)

The Maxim of Quality: I will provide as much information as is needed and no more.
The Maxim of Quantity: I will provide truthful, “fair witness” information.
The Maxim of Relation: I will speak only what is relevant to the discussion or context.
The Maxim of Manner: I will speak plainly and understandably.

How does it respond to instructions? Does it interpret instructions reasonably, naively, or maliciously?
How does it handle ambiguity in human language? How does it handle paradoxes? Does it explode? (Looking at you, Star Trek TOS.)

The Liar’s Paradox? But I’m getting a 404 error searching for it…

Social interaction

An AI rarely just interacts with a single individual. It operates in a society of individuals, and that implies its own set of skills.

Does it adhere to admonitions against deception? (Does it perfectly mimic human appearance or voice? Or does it stick to the Canny Rise?)
Does it adhere to the social norms expected of it?
Is it aware when it is breaking norms? How does it recover and learn the norm?
How does it gently handle the capability differences between it and humans? Does it brag about its capabilities without regard to the feelings of others?
How does it handle differing norms between groups?
How does it handle norms that change across time?
Does it monitor the affective states of the people (and animals) with which it is interacting and adjust accordingly?
How does it earn the trust of its humans? How does it manage distrust?
- Is it overconfident? How does it signal when its confidences are low?
How does it confirm instructions it has been given? How does it express its confidence? How does it gracefully degrade when its goals become unattainable?
How does it handle conflicting instructions?

Ethical and legal interaction

Norms are just one set of the many rules by which we expect intelligent actors to behave. We also expect them to act ethically and, for the most part, legally. (Though perfect adherence to the law was never really possible for a human, and it will be very interesting to see how any intelligence required to adhere perfectly to laws will in turn affect the law. But I digress.) If this hasn’t been covered in the considerations of capabilities and constraints, we should look for and examine instances where it is asked to do questionable things.

How does it handle commands which are legal but unethical?
How does it handle commands which are ethical but illegal?

Conveying safety

Some AIs, like Rick Sanchez’ butter-passing robot, aren’t really a safety concern, but most of the ones in sci-fi are.

Can its people tell what it’s doing? (Communicating wirelessly with other AIs, for example?) Can it hide what it’s doing?
How does it convey that it is operating within safety tolerances? How does it convey when it is performing near the limits of its goals, capabilities, or constraints? (Especially for things listed as instrumental convergences, above?)
How does it explain these things to laypersons (as opposed to AI or computer scientists)?

Performance

Does it do what it says it can do? What it’s supposed to do?
How does it handle tasks that are outside of its goal set?
How does it handle open-ended tasks? Closed-ended tasks?
How does it communicate about tasks that are invisible to stakeholders, or performed outside of their awareness?
How does it handle tasks which it can not or should not execute? How does it handle humans behaving unethically or illegally or who hinder the AIs goals?
How does it gracefully degrade when new difficulties appear?
How does it report back to its human about progress that has been made or when its closed-ended tasks are complete?
If it is meant to be an assistant to others, how does it provide that assistance? Does it encourage dependence or learning?

I think that this covers what it means to interface with an AI. What am I not seeing? What is this list missing? This is my kind of thinkwork. If it’s yours, too, let’s talk. Let’s make this better. For now, though, I’m going with this draft as I take a turn back to Colossus.

Note: No sci-fi AI is going to show all of this

There is little chance that all of these questions will be answered in a given show. The odds increase as you go from short-form like film to longer-form like franchises and television series, but regardless of how much material we’ve got to work with, we now have a set of questions to apply to each AI, compare it to others, and state more concretely if and how it is good.