Gendered AI, initial results

Men are machines. Women are bodies.
Male is extreme. Women are nuance.
General AI has gender. Other AI does not.
Male is free-will. Machine is subservience.
Male is default. Women when it’s necessary.

At least in screen sci-fi.

Let me explain.

In November of 2018, a tweet thread between Chris Geison and Kathy Baxter called my attention to questions about the gender of AI in sci-fi. Baxter noted that most AI is male, and how female AI is often quite subservient or sexualized. In this thread, Gieson added Cathy Pearl’s observation that embodied AI is often female and male is more often disembodied and regarded as a peer.

I already had a “database” (read: Google Sheet) of AI in screen sci-fi from Untold AI, my 2018 study of the stories screen sci-fi doesn’t tell, but should. So, I thought I could provide some formal analysis to this Gendered AI discussion. To that end I’ve added around 325 AI characters to the Google Sheet, and run some analyses. This series of posts will break it all down for you.

Image result for r2d2 hologram
Oh, we’ll come back to this little “guy.”

Now, it can get a little dry to talk about percentages and comparisons and distributions, so I’m going to do my best to keep tying things back to the shows and the characters and the upshot of all this analysis. But the way we get to that upshot is through the numbers, so stick with me. For this first post, I’m going to share what I captured, and what counts as an AI character for purposes of this study.

The following is true in the survey as of 08 APR 2019. The live data, available in Google Sheets, may be updated from this.

The data set

  • 327 AI characters from science fiction (see the full list in the live sheet)
  • Movies and television shows from 1927 (Metropolis)–2018 (Upgrade)

Call to action: Of course I missed some movies and TV shows. Add them in the comments, including a link to their IMDB page.

The survey that drives this site has always focused on screen sci-fi for its ability to depict interfaces that can be reviewed. Literature is much more free to experiment with ideas than screen sci-fi, and so will have lots of additional examples, but won’t appear in the survey.

Each character is tagged multiple ways. More detail on particular attributes below.

  • Movie or Show Title and Episode if appropriate
  • Year
  • Name
  • Embodiment
  • Physicality/Virtuality
  • Gender Presentation (which is a roll-up of four separately tracked variables)
    • Appearance or evidence of primary sex characteristics
    • Appearance of secondary sex characteristics
    • Voice
    • Pronouns used by other characters
  • Subservience to humans
  • Germane-ness of gender (more on this in its own section)
  • Goodness
  • If not free-willed, the gender of the master
  • Category of AI (Narrow, General, or Super)
  • Whether their gender presentation changes over time
  • Genesis, or how the AI came to be. This is mostly used to distinguish AI that are copies of humans (whose gender would thereby be inherited).

Call to action: If you think there’s some critical attribute that I’m missing, pipe up in the comments. I can’t promise I can get to it before the next post, but I can consider it as a future enhancement.

Image result for skynet helena

Yes, but which Skynet?

With the exception of the flag marking changed genders, when characters change other attributes over the course of their stories, they are tagged for their final state. For example, the Maschinenmensch from Metropolis begins an anthropomorphic robot, but after Rotwang transfers Maria’s likeness to it, it becomes indistinguishable from human, and so is tagged as such.

If you’re looking at the Sheets data, you’ll see that text values have corresponding numerical columns to allow for easy sorting and graphing data, but I tried to gray them so they don’t distract from a reading of the raw data.

Image result for rotwang

Full disclosure: Possible problems with this data

  • Sci-fi is a vast supergenre. There are certainly examples missing from the survey, so it should not be regarded as exhaustive. (I tried to get as many as I could.)
    • I generally target well-known examples rather than limited-release or student projects.
    • The sci-fi interfaces blog usually eschews comedy that breaks the 4th wall routinely, (e.g. Spaceballs), as this makes for very complicated analysis, and so the survey will be missing these examples as well.
    • I only speak English fluently, and so have only reviewed shows in English, with English dubbing, or with English subtitling.
  • I am not a data scientist. I’m a smart guy who tried his best, but may have made some errors in the formulas.
  • I am not an expert in gender issues. I may make unintentional errors in discussing or categorizing genders, use insensitive language, or have naive errors in my thinking. I have engaged a professional sensitivity review, but of course they might not catch everything, either.
  • I am a progressive, liberal, (imperfect, see above) feminist. Though I tried not to, my bias may have colored how I coded the examples and of course the interpretation of this data.
  • I have to go on a LOT of common-case presumptions. For example, men can have breasts for many reasons, but I used the presence of breasts as one marker of female-ness. I suspect this is a disservice to the real complexity of gender and sex in the world, but presuming the audience sees gender as primarily binary, it marks how these characters are likely perceived rather than what they are.

I’m not too worried about these caveats, though, since what we’re aiming for here isn’t precision engineering specs, but rather to get a numbers-based sense of the big patterns in screen sci-fi, and for that, a little bit of noise in the numbers is OK.

Lastly, not every character that you think might qualify does, so I should explain my rationale for what got in and what got left out.

What counts as an AI character?

I’ve tried to be strict about what counts as AI in that the intelligence of the character must be housed in non-biological circuitry. This leaves out some characters that on a cursory consideration would seem like a natural fit. For an example, compare The Stepford Wives (1975) and The Stepford Wives (2004). The wives in the original were robots through and through—mechanical, lookalike replacements of the original humans. But the wives in the remake were cyborgs, with robotic bodies housing their original, human brains. This means that in the original, the wives count as AI and appear in the survey. But because of this cyborg technicality, none of the “robotic” characters from the remake make it in. Not even the little cyborg dog.

Meanwhile, Rachel and Deckard, replicants from the Blade Runner universe, had a baby (according to Blade Runner 2049) so we can generalize and say replicants are capable of wholly biological reproductive acts. Given this you might think they’re out of the survey, but, since they are fabricated, they get into the survey.

Also, T-800s Terminators (the Arnold kind) get in, because even with their wetware bodies, the intelligence they carry is non-biological.

I know, it’s complex and sometimes counter-intuitive. Such is data.

OK, so looking at those attributes for those characters, the first thing we should look at is the distributions. This included all sorts of questions like: How many AI present as men? How many as women? How many are nonbinary? What kinds of bodies do they have? Who is master of whom?

It’s thrilling, thrilling data analysis action, so stay tuned.

Gendered AI: Gender Presentation and Distributions in sci-fi AI

In the first post of this series, I explained what I was out to learn, what I looked at, and how I tagged it. Ultimately, we want to look at the data and be able to answer questions like “Are female AIs more subservient than male AIs?” And in order to do that, we first have to understand what the distributions are for sex and subservience. So let’s talk distributions.

Distribution is a fancy term for how many of each value we see for a given attribute. For example, if we wanted to look at the distribution of eye color across the world, we would count how many browns, blues, hazels, ambers, green, gray, and reds that we see, (finding a way to deal with heterochromia, etc.) and compare them in a bar chart.

Of course eye color is not of interest in this case. For Gendered AI, we are interested in comparing other attributes to gender presentation. We’ll look at the other attributes in later posts, but we’re going to begin with sex ratio, and that will fill up a post all its own.

Simple sex ratio

Author’s request: With that section title I know some hackles are already raised. Please know this is very tough space to write for. Despite having paid for a number of paid content reviews, I may have made some missteps. I am a n00b writer on these topics, and I respond best to friendly engagement rather than a digital pillory.

The very simple explanation of sex ratio is women-to-men. But of course that’s waaaaay too simple for either the real world or our purposes. At the very (very) least, AI might have no gender, so we need a “none” or “other” category. Let’s start with these very oversimplified numbers and move to more detailed later.

The chart shown below shows the data from the survey focusing on simple categories of female, other, and male. The chart shows that AI characters are strongly overweighted male, with a rough ratio of 2 male : 1 female : 0.75 other. The 2:1 M:F ratio is eerily in line with USC Signal Analysis and Interpretation Laboratory’s finding where speaking roles in 1000 scripts they studied, men’s dialogue, and even the number of characters was double (or over) that for women. This is greatly different than the real-world sex ratios of 1:1 as reported in the Wikipedia article about world sex ratios.

I would talk about the weird discrepancies of just this distribution, but any ranting at this point would be overshadowed by the ranting that happens next. Deep breath.

Having an “other” category isn’t enough. After all, characters in one of these bars can be as different as HAL and Gigolo Joe, and that doesn’t seem right. So, let’s break this oversimplification down into more refined bits.

More detailed gender presentation ratios

First, of course, we should note that characters rarely discuss gender directly, and—at least in this sample—discuss gender dysphoria all of never. Also we can’t reach out to ask any of them directly since they’re fictional. So when I speak of gender, it should be read as “gender presentation,” and unfortunately at this point you are stuck with nothing more scientific than my reading of the following four variables.

  • Primary sex characteristics, or biological presentation: The presence of masculine or feminine sexual organs. None of the titles I reviewed were pornographic, and full-frontal nudity is pretty rare up until Westworld, so this often comes down to implication. Gigolo Joe, for instance, could not do what must be a key part of his primary function without male sex organs (with all the important caveats that penetrative sex is just one kind of sex), so he is listed as “Masculine” here.
  • Secondary sex characteristics, or body presentation: These are much more directly observable, and include those other markers of sex, like facial hair and shoulder-to-hip ratio.
  • Voice presentation: This is my hearing of whether the voice has a lower, masculine register, or a higher, feminine register. (In a few cases I checked on the actor listing in IMDB and did web searches for evidence of self-identification.)
  • Pronoun presentation: How other characters refer to the AI character with pronouns. R2D2, for instance, has absolutely no sex characteristics, and no voice, but is still referred to as a “he” throughout the Star Wars franchise.

A note on labeling: I’m aware that there are tricky nuances in the labels. After all, how is body not part of one’s biology? But the shorthand proves useful so we can use the shorthand “BIO” and know what it means instead of always having to use the longer phrase “implicit or explicit primary sex characteristics.”

For each AI character, I tagged each of these variables as either Masculine, Fluid, Neutral, Feminine, Unknown, Multiple, Many, or N/A. (The “n/a” may seem weird, but for instance, HAL doesn’t have a body, so primary and secondary sex characteristics are not applicable.)

Socially male, but existentially neutral.

Combining voice and pronouns into “social”

There are plenty of characters with no voice or non-human voices, and a few characters that are not referred to by pronoun. Since these two indicate a social performance of gender, I treated them in the algorithms as an “OR” when considering stacking. That means if either variable was present, and they didn’t contradict, I counted it the presenting aspect. Compare these two examples…

  • Wall·E: N/A Primary, N/A Secondary, masculine voice, unmentioned pronoun = socially male
  • R2D2: N/A Primary, N/A Secondary, neutral voice, male pronoun = also socially male

They stack

The main thing to note about how these three variables (counting voice and pronouns as “socially”) played out is that they overwhelmingly stacked. That’s not a term of art, so let me explain. It means that if a character has masculine primary sex characteristics, that invariably meant that he also had masculine secondary sex characteristics, and voice/pronouns. If a character had no evidence of primary sex characteristics, but had feminine secondary sex characteristics, she invariably had feminine voice/pronouns.

It makes more sense if I show you. So, here are six representative examples from the survey of how this monosex stacking looks.

I suspect this is an effect of binary concepts of gender on the part of the makers of the sci-fi, implemented as increasingly detailed costumes for the AI. But when you consider these variables, these 6 are a pale semblance of what could be. Include “fluid” or “nonbinary” as a possibility, and don’t bother with stacking, and there are 58 more possible combinations of these variables.

Click the image for a full-screen spread of possibilities.

Hey, want to feel both hyper-reductive and overwhelmed at the complexity of gender? Try writing a categorization algorithm for analysis.

Anyway, if they hadn’t stacked like they did, I would have had to describe their genders with a four-part-code that would result in 64 genders. But, because they do stack, that meant there were these 6, plus “multiple,” “genderfluid,” “neutral,” and “none,” for a total 9. Note that online lists of genders vary from the 58 available to Facebook’s users to the 229 found on this more creative list (my favorite is “Schrodigender – A gender which you can both feel and not feel” giving a clue to how serious that particular list is.) So while 9 can feel heavy, it does not compare to the complexity of the real world.

OK, given those descriptions of the subcategories, here’s how the numbers played out in the much more detailed analysis of gender presentation in sci-fi AI.

Detailed gender presentation

I’ve noted that we’re here for the correlations, not distributions, but in and of itself, this is remarkable. The subcategories provide a deeper (and more troubling) look into the data, and is necessary because these categories have to be thought of differently. Observe, for example, that the biologically-gendered characters are nearly at parity, while the bodily- and socially-gendered characters skew male. There is a frustrating 2:1 ratio for bodily male:bodily female and an infuriating 5:1 for socially male:socially female.

These ratios bear…discussion.

1 biologically male : 1 biologically female

A harsh interpretation of this stat would read a kind of heterosexual panic, where—when sex or procreation is involved—Hollywood needs to assert loudly over a hastily-ordered beer that whoa whoa whoa: Only AI chicks and AI dudes get it on. Or if they do get it on with people it’s with the right gender.

Or, more charitably I suppose, humans are largely heterosexual, and since there is a rough 1:1 sex ratio in humans, there should be a 1:1 sex ratio in them. (?) It’s a hard thing to second-guess.

It gets darker in the other categories where the sci-fi AI has a body but no biological apparatus. The ratios still skew heavily male. As if, when it comes to just being a person, a total sausagefest is the norm.

I await the disturbing fanfic.

2 bodily male : 1 bodily female

Recall from above that this category is reserved for those AI characters that present a gendered body but do not have gendered reproductive or sexual capabilities. We will discuss the germane-ness and embodiment of these AIs in a later post, but for now we can note that this category of AI character, with its 2:1 ratio is roughly in the middle between the biologically and socially gendered categories, and in-line with the oversimplified distribution seen above.

5 socially male : 1 socially female

This is the category where the only markers of gender are voice and pronouns. In other words, characters for whom a gender seems like an arbitrary choice. WTF is up with a 5:1 ratio? Why are all these “arbitrarily” gendered AI characters guys? We’ll talk about germaneness to the story later, but I want to see if there is some extradiegetic reason first.

Is it the available voice talent?

We have to acknowledge that filmmakers must hire someone to voice their speaking AI characters, even if there are no other markers. Despite the fact that…

…it’s fair to say that most available voice talent is recognizably gendered, and the AI character may just inherit the presentation of its actor. Then you might expect the roles to match the sex ratios in the available talent pool. I couldn’t find any formal studies of this, so I created a throwaway account on—a major job site for voice actors—and performed separate searches for male and female talent. There I found 42,786 males, and 24,347 female non-union voice actors, around 2:1. (Union actors were closer to 1:1, with 3,079 male and 2,336 female. n.b. The site gives only those two gender options in its search.) Though that’s more anecdotal than I’d like, even the worse ratio of 2:1 still pales compared the 5:1 of socially gendered AI, so no, that’s not it. You might think that explains the “simply” gendered characters, but my suspicion is that the genders of the characters are set in the script and pass down through the process, unquestioned after that.

Is it what sci-fi audiences want?

Might the ratio be some sales rationale, some presumption that sci-fi audiences are mostly men and therefore might only be more interested in male characters? No, of course numbers vary by show and genre, but this article by Victoria McNally shows that there is only a slight majority of men in these audiences (hovering around 60% male and 40% female, rather than 73% male and 17% female, which the 5:1 socially gendered ratio would have you believe.)

Plus the 2018 annual Hollywood Diversity Report by UCLA shows that “new evidence from 2015–16 suggests that America’s increasingly diverse audiences prefer diverse film and television content,” so we would have to greatly exaggerate the connection between the sex ratio of the audience and those we see here.

There has to be some other reason, and I suspect it’s the dark patriarchal notion that “male” is somehow the default gender. Even though it is, literally, not.

Is it that Hollywood itself is mostly white and male?

The 2018 Hollywood Diversity Report shows that gatekeepers, writers, directors, and (points at self) critics are still overwhelmingly white and male. White male writers and directors account for 91.9% and 86.2% if their fields, respectively. This is closer to the 73% male, but still a crappy, crappy excuse for the default assignment of AI as male. Representation matters and this is sorry representation.

P.S. Don’t get uppity, real world

The Global Gender Gap Report issued on 17 DEC 2018 by the World Economic Forum showed (in collaboration with LinkedIn) that women only occupy 22% of jobs in AI professions. (See page viii, 28–35 of that report.)

So yeah.

Pictured: Sci-fi AI, mostly

You probably had a general sense of this disparity from simply being an audience member. But it’s “nice” to have some data to back it up. Be forewarned: It gets worse when we look at correlations. (No, really.) But before we do that, we should look at the rest of the distributions, starting with embodiment in the next post.

Gendered AI: Embodiment

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss embodiment.

As always, you can read the Gendered AI posts in order or check out the source data for more information.


Another simple measurement is how the AIs are embodied. That is, how to they manifest in the world of the story (or diegesis): Are they walking around, appearing as a screen on a wall, or as pulsing stars in the cosmos?

The categories that emerged from the survey were as follows:

  • Virtual, where a character only had, for example, a body or face that was generated for presentation to other characters on a screen or via volumetric projection. Joi from Blade Runner 2049 is virtual.
  • Disembodied, if the AI doesn’t have a particular, or an ad-hoc embodiment. The Machine from Person of Interest is disembodied.
  • Edgar from Electric Dreams is a Personal computer. In this regard, Edgar is a sui generis, or a category containing only one example.
  • Architectural: Some AIs are stuck to the walls of a building. HAL 9000 from 2001: A Space Odyssey is architectural.
  • Vehicular, where a character is embodied in a vehicle of some sort. K.I.T.T. from Knight Rider is vehicular.
  • Zoomorphic robot, where the robot is built to look something like an animal. Often these characters do not have voice. Muffit from the original Battlestar Galactica television series is an example.
  • Mechanical robot, where the robot is mechanical (and more mechanical looking than humanoid looking). WALL·E is mechanical.
  • Anthropomorphic robot, where the robot is proportioned like a human, and has most all the surface features of a human, but is readily identifiable as a robot. The Iron Giant is anthropomorphic.
  • Indistinguishable from human, where the robot can “pass” as a human. Only detailed or violent inspection will reveal it to be non-human. Aida from Agents of S.H.I.E.L.D. is indistinguishable from humans.

Here’s what that looks like in a bar chart.

Sometimes the details are tricksy

Sci-fi can make these things tricky. For example, the virtual crewmembers of the U.S.S. Callister might be considered indistinguishable from humans—as long as they are wearing clothes. Their unfortunate captain (and captor) had them created in virtual space such that they had no genitals. They are listed as bodily male and bodily female (rather than biologically) even though they are also indistinguishable from human.

Similarly, David from Prometheus has a fingerprint with a subtle Weyland-Yutani logo maker’s mark built into it (see the image below), but since this would only be apparent to someone who knew exactly where to look and for what, David is also listed as indistinguishable from human.

He just has to find crimes that don’t involve fingerprints.

Why so human?

My conjecture to explain the high number of AIs that indistinguishable from human is threefold.

First, it is a matter of production convenience—that is, it is much easier and cheaper to insert a line of dialogue that establishes a character as a human-looking robot, rather than any of the other ways of signaling robotic-ness:

  • Create a costume like Robbie the Robot
  • Make a puppet like Teddy from A.I. Artificial Intelligence
  • Do prosthetic makeup like The Terminator
  • Create a set piece that syncs with audio like Alphy from Barbarella
  • Produce special effects, like Ava from Ex Machina

There’s also a fit-to-media argument which notes that people are much better and more comfortable at reading the emotional states of people than they are of machines. If catharsis, or the emotional journey, is part of what the art is about, humans work as a medium. (This lack of emotional information in interfaces was played to great effect in 2001: A Space Odyssey, unnerving us with the psychopathy of HAL’s unblinking eye.) Actors, too (I highly suspect) enjoy using their bodies, voices, and faces to do their jobs without the additional layers of prosthetics or puppetry. So we would expect an overweighting of indistinguishable from humans because they are often the best tools for the narrative job, from both the audience’s and the actor’s perspective.

Not a lot of emotive potential here.

There’s another argument—a genre-and-narrative argument—that people are mostly interested in stories about people, and most sci-fi is a speculation about social effects rather than actual technology, and so indistinguishable robots are the best embodiment of what we’re interested in, anyway. Humans, just with different rules.

To riff on Daniel Mallory Ortberg’s “what if phones but too much”…

  • What if humans, but smarter and faster and helpful?
  • What if humans, but relentless indestructible assassins?
  • What if humans, but their enslavement was “ok”?
  • What if humans, but too much?

Gendered AI: Gender of master

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss the gender of the AI’s master.

As always, you can read the Gendered AI posts in order or check out the source data for more information.

Gender of Master

In the prior post I shared the distributions for subservience. And while most sci-fi AI are free-willed, what about the rest? Those poor digital souls who are compelled to obey someone, someones, or some thing? What is the gender of their master?

Of course this becomes much more interesting when later we see the correlation against the gender of the AI, but the distribution is also interesting in and of itself. The gender options of this variable are the same as the options for the gender of the AI character, but the master may not be AI.

Before we get to the breakdown, this bears some notes, because the question of master is more complicated than it might first seem.

  • If a character is listed as free-willed, I set their master as N/A (Not Applicable). This may ring false in some cases. For example, the characters in Westworld can be shut down with near-field command signals, so they kind of have “masters.” But, if you asked the character themselves, they are completely free-willed and would smash those near-field signals to bits, given the chance. N/A is not shown in this chart because masterlessness does not make sense when looking at masters.
  • Similarly, there are AI characters listed as free-willed but whose “job” entails obedience to some superior; like BB-8 in the Star Wars diegesis, who is an astromech droid, and must obey a pilot. But since BB-8 is free to rebel and quit his job if he wants to, he is listed as free-willed and therefore has a master of N/A.
  • If a character had an obedience directive like, “obey humans,” the gender of the master is tagged as “Multiple.” Because Multiple would not help us understand a gender bias, it is not shown on the chart.
  • The Terminator robots were a tough call, since in the movies in which most of them appear, Skynet is their master, and it does not gain a gender until Terminator Salvation, when it appears on screen as a female. Later it infects a human body that is male in Terminator Genisys. Ultimately I tagged these characters as having a master of the gender particular to their movie. Up to Salvation it’s None. In Salvation it’s female, and in Genisys it’s male.

So, with those notes, here is the distribution. It’s another sausagefest.

Again, we see the masters are highly skewed male. This doesn’t distinguish between human male and AI male, which partly accounts for the high biologically male value compared to male. Note that sex ratios in Hollywood tend towards 2:1 male:female for actors, generally. So the 12:1 (aggregating sex) that we see here cannot be written off as a matter inherited from available roles. Hollywood tells us that men are masters.

The 12:1 sex ratio cannot be written off as a matter inherited from available roles. It’s something more.

Oh, and it’s not a mistake in the data, there are no socially female AI characters who are masters of another AI of any gender presentation. That leaves us with 5 female masters, countable on one hand, and the first two can be dismissed as a technicality, since these were identities adopted by Skynet as a matter of convenience.

  1. Skynet-as-Kogan is master of John, the T-3000, from Terminator Genisys
  2. Skynet-as-Kogan is master of the T-5000 from Terminator Genisys
  3. Barbarella is master of Alphy from Barbarella
  4. VIKI is master of the NS-5 robots from I, Robot
  5. Martha is master of Ash in Black Mirror, “Be Right Back”

Gendered AI: Subservience and free will

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss subservience and free will.

As always, you can read the Gendered AI posts in order or check out the source data for more information.

Subservience/Free will

The degree of free-willedness is tagged as subservience.

  • The majority of AIs are free-willed, that is, answerable only to their own conscience. Ultron from the Marvel Cinematic Universe is free-willed.
  • A large proportion answer to some master, but enjoy a wide berth in interpreting instructions, and can formulate new plans to achieve goals. There are tagged improvisational obedience. Gort from The Day the Earth Stood Still is one of these.
  • A few are tagged as slavish obedience, and will take no action unless ordered to do so and will only take the action instructed. Robbie the Robot in Forbidden Planet is slavishly obedient.
  • A small minority are bound to a master against their will. These characters are tagged with reluctant obedience. Ava from Ex Machina was only reluctantly obedient, and took great pains to escape.

Reinforcing the notion that from embodiment, the subservience of AI is the exception for these characters. Mostly they are as free-willed as people are. (Insert determinist counter-argument here.)

Gendered AI: Germane-ness Distributions

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss how germane the AI character’s gender is germane to the plot of the story in which they appear.

As always, you can read the Gendered AI posts in order or check out the source data for more information.


Is the AI character’s gender germane to the plot? This aspect was tagged to test the question of whether characters are by default male, and only made female when there is some narrative reason for it. (Which would be shitty and objectifying.) To answer such a question we would first need to identify those characters that seemed to have the gender they do, and look at the sex ratio of what remains.

Example: A human is in love with an AI. This human is heteroromantic and male, so the AI “needs” to be female. (Samantha in Her by Spike Jonze, pictured below).

If we bypass examples like this, i.e. of characters that “need” a particular gender, the gender of those remaining ought to be, by exclusion, arbitrary. This set could be any gender. But what we see is far from arbitrary.

Before I get to the chart, two notes. First, let me say, I’m aware it’s a charged statement to say that any character’s gender is not germane. Given modern identity and gender politics, every character’s gender (or lack of, in the case of AI) is of interest to us, with this study being a fine and at-hand example. So to be clear, what I mean by not germane is that it is not germane to the plot. The gender could have been switched and say, only pronouns in the dialogue would need to change. This was tagged in three ways.

  • Not: Where the gender could be changed and the plot not affected at all. The gender of the AI vending machines in Red Dwarf is listed as not germane.
  • Slightly: Where there is a reason for the gender, such as having a romantic or sexual relation with another character who is interested in the gender of their partners. It is tagged as slightly germane if, with a few other changes in the narrative, a swap is possible. For instance, in the movie Her, you could change the OS to male, and by switching Theodore to a non-heterosexual male or a non-homosexual woman, the plot would work just fine. You’d just have to change the name to Him and make all the Powerpuff Girl fans needlessly giddy.
  • Highly: Where the plot would not work if the character was another sex or gender. Rachel gave birth between Blade Runner and Blade Runner 2049. Barring some new rule for the diegesis, this could not have happened if she was male, nor (spoiler) would she have died in childbirth, so 2049 could not have happened the way it did.

Second, note that this category went through a sea-change as I developed the study. At first, for instance, I tagged the Stepford Wives as Highly Germane, since the story is about forced gender roles of married women. My thinking was that historically, husbands have been the oppressors of wives far more than the other way around, so to change their gender is to invert the theme entirely. But I later let go of this attachment to purity of theme, since movies can be made about edge cases and even deplorable themes. My approval of their theme is immaterial.

So, the chart. Given those criteria, the gender of characters is not germane the overwhelming majority of the time.

At the time of writing, there are only six characters that are tagged as highly germane, four of which involve biological acts of reproduction. (And it would really only take a few lines of dialogue hinting at biotech to overcome this.)

  • XEM
  • A baby? But we’re both women.
  • HIR
  • Yes, but we’re machines, and not bound by the rules of humanity.
  • HIR lays her hand on XEM’s stomach.
  • HIR’s hand glows.
  • XEM looks at HIR in surprise.
  • XEM
  • I’m pregnant!

Anyway, here are the four breeders.

  • David from Uncanny
  • Rachel from Blade Runner (who is revealed to have made a baby with Deckard in the sequel Blade Runner 2049)
  • Deckard from Blade Runner and Blade Runner 2049
  • Proteus IV from the disturbing Demon Seed

The last two highly germane are cases where a robot was given a gender in order to mimic a particular living person, and in each case that person is a woman.

  1. Maria from Metropolis
  2. Buffybot from Buffy the Vampire Slayer

I admit that I am only, say, 51% confident in tagging these as highly germane, since you could change the original character’s gender. But since this is such a small percentage of the total, and would not affect the original question of a “default” gender either way, I didn’t stress too much about finding some ironclad way to resolve this.

Gendered AI: Goodness Distributions

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss goodness.

As always, you can read the Gendered AI posts in order or check out the source data for more information.

Goodness vs. Evilness

Goodness is a very crude estimation of how good or evil the AI seems to be. It’s wholly subjective, and as such it’s only useful patterns rather than ethical precision.

If you’re looking at the Google Sheet, note that I originally called it “alignment” because of old D&D vocabulary, but honestly it does not map well to that system at all.

  • Very good are AI characters that seem virtuous and whose motivations are altruistic. Wall·E is very good.
  • Somewhat good are characters who lean good, but whose goodness may be inherited from their master, or whose behavior occasionally is self-serving or other-damaging. JARVIS from Iron Man is somewhat good.
  • Neutral or mixed characters may be true to their principles but hostile to members of outgroups; or exhibit roughly-equal variations in motivations, care for others, and effects. Marvin from The Hitchhiker’s Guide to the Galaxy is neutral.
  • Somewhat evil characters are characters who lean evil, but whose evil may be inherited from their master, or whose behavior is occasionally altruistic or nurturing. A character who must obey another is limited to somewhat evil. David from Prometheus is somewhat evil.
  • Very evil are AI characters whose motivations are highly self-serving or destructive. Skynet from The Terminator series is very evil, given that whole multiple-time-traveling-attempts-at-genocide thing.

Though slightly more evil than good, it’s a roughly even split in the survey between evil, good, and neutral AI characters.

Gendered AI: Category of Intelligence

Where we are: To talk about how sci-fi AI attributes correlate, we first have to understand how their attributes are distributed.  In the first distribution post, I presented the foundational distributions for sex and gender presentation across sci-fi AI. Today we’ll discuss categorically how intelligent the AI appears to be.

As always, you can read the Gendered AI posts in order or check out the source data for more information.


AI literature distinguishes between three levels.

  • Narrow AI is smart but only in a very limited domain and cannot use its knowledge in one domain to build intelligence in novel domains. The Spider Tank from Ghost in the Shell in narrow AI.
  • General AI is human-like its knowledge, memory, thinking, learning. Aida from Agents of S.H.I.E.L.D. possesses a general intelligence.
  • Super AI is inhumanly smart, outthinking and outlearning us by orders of magnitude. Deep Thought from The Hitchhiker’s Guide to the Galaxy is a super AI.

The overwhelming majority of sci-fi AI displays a general intelligence.

Gendered AI: AI Picks Female

Where we are in this series: I just finished showing how AI in sci-fi presents gender, what bodies it is given, how subservient it is, the gender presentation of the masters of AI, how germane the gender of the AI was to the plot of the stories in which they appear, how good or evil those AIs were, and what category of AI they seemed to be. Next up we’re going to look at the correlations of those distributions to gender, but first a fun fact from the survey.

There are all of three AI characters who elect their gender presentation for some reason other than deception.

1In “The Offspring” episode of Star Trek: The Next Generation, Data builds an adult child named Lal. Data gives Lal the opportunity to pick their gender, and Lal picks female.

2Holly, the AI in Red Dwarf begins presenting male and after a bit reveals that she would rather present as female. Later, she is destroyed and rebuilt from an earlier copy, when the AI presents as male again, but notably, this was not Holly’s decision.

3The Machine, from Person of Interest (shout-out: it won the award for best representation of the AI science in the Untold AI series, and a personal favorite) chooses in the last season to adopt the voice of its main devotee, Root, who is female.

Image result for root person of interest
The Machine is never directly embodied in the series, but here’s a pic of Root.

Though this is a very small sample inside our dataset, it is notable in light the male bias that AI characters show, by these examples,…

when an AI chooses a gender presentation, it is always a female.

Not quite “picking a gender”

There are a handful of other times an AI winds up with a gender presentation that can not quite be said to be a matter of personal preference.

  • If you’re wondering about the Maschinenmensch from Metropolis, its gender is not a choice, but something assigned to it by the mad scientist Rotwang as part of a plot of deception.
  • If you’re thinking of Skynet, from the Terminator series, it has no presenting gender until Terminator Salvation. In that film the AI chooses to mimic a female character, Dr. Kogan, because “Calculations confirm Serena Kogan’s face is the easiest for [Marcus] to process.” It assures him that if he preferred someone different, Skynet could mimic another person. So this is not picking gender for an identity reason as much as a mask for efficacy.
  • Later in Terminator Genisys, Skynet is embodied as a man, the T-5000 known as “Alex,” but this appears to be the opportunistic colonization of an available body rather than a selection by the AI.
  • The Puppet Master from Ghost in the Shell is similarly an opportunistic colonization of a female cyborg. There might be some selection process in the choice of a victim, but that evidence is not on screen.
  • In Futurama, Bender has also opted several times to be female, but it is for the express purpose of getting something out of the deal, such as competing in the Robo-Olympics or to play a heel character in wrestling. By the end of each episode, he’s back to being his old self again.

If you know of additional or even counterexamples, let me know so I can add them to the database. But as of right now, the AI future looks female.

Gendered AI: Correlation 101 & Method

So the basic distributions (prior posts in the series) are fascinating themselves, but what brought us to this study is how those counts correlate. And while you could correlate any of these attributes (gender, embodiment, subservience, etc.) against any other, what follows is a measure of the correlation of gender to the other attributes.

In case you are not familiar with correlations, here’s the sci-fi interfaces “correlations 101”.

Ratios of values

Let’s say you have a group of 100 people, and you know their sex (simplified as male and female for this explanation) and their eye color (simplified again to green, blue, or brown). Let’s also say there’s a perfectly even ratio of attributes. Half are male and half are female. One-third of people have green, another third have blue, and the last third have brown eyes.

gender by Gregor Cresnar and Eye by Santiago de Souza, from the Noun Project

Correlations across attributes

The question of correlation goes something like this: When we meet a female in this group, what are the odds her eyes are brown?

In a perfect distribution of sex and eye color, you might expect ⅓ of women to have green eyes, ⅓ of women to have blue eyes, and ⅓ to have brown eyes. After all, ⅓ of (this imaginary) population does, and women are half of that, so, logically, ⅓ of them should have brown eyes. That would mean that for any of these females, the odds should be around 33% that their eyes are brown.

But if, looking at the data, you actually found that ⅔ of women had blue eyes and ⅓ of the women had green eyes, you would have a very imperfect distribution, and you would rightly wonder what was going on. Why do the guys have all the brown eyes? Is blue-eyed-ness somehow connected to being female? This would point at something weird going on, bearing further inquiry. What’s Up with Dudes Having all the Brown Eyes? Thank you for coming to my TED talk.

So that’s a basic explanation. Of course we don’t really care about eye color. But if you substitute eye color for, say, wealth, you can see why we might care about looking at correlations. If the top 33% of earners were all dudes, we’d try and suss out why the gross wealth inequality.

Now, circles and wedges make for easy pedagogical shapes, but they’re not that great for understanding the data, especially when it gets more complicated, say, with our 11 categories of sci-fi AI gender presentation. So instead of circular diagrams, instead I’ll use bar charts to show how far off from perfect each attribute is. In the case of the perfect distribution, the bars would be at zero, as on the lower left in the image above. It would be a very boring bar chart.

But in the case of the weird dudes-brown and ladies-blue scenario on the lower right, the bar charts for blue and brown would be correspondingly as far from zero as the chart will allow. The green attribute, since it was perfectly distributed in that example, still sits at zero. You’ll note though that if you added up all the blue values in the chart, they would sum to zero. The same for brown and green bars. If you cared to do a check of the data, this is one way you could check to see if it was valid.

Of course real world data rarely, if ever, looks this extreme and clean. It’s usually more nuanced, and needs careful reading. In the example below, females are overweighted for blue eyes and males overweighted for the other two. That bar chart would look like this.

Note that it’s important to read the scale on the left. We’re no longer looking at a 100-percent bars. The female-blue overweighting is only 16.67 percent. That would be significant, but not as significant as if it was peaked out at 100. So be sure and read the scales.

My method

NOTE: If you’re not interested in the soundness of the methods, the rest of this post is going to be boring. But I need to lay out my methods to make sure I’m not doing my math wrong (if I was, we’d have to reconsider all the conclusions). I’ll also use as plain spoken language as I can in case you want to follow along. The good news is, it’s pretty simple math.

If we were working with floating-point values, then we might be able to do some fancy math called a Pearson correlation to measure correlations. I did this as part of the Untold AI study. But each of our variables in the Gendered AI study are categorical, more like eye color than weight. So I had to go about looking at correlations in a different way.

  1. First I looked at simple counts for all combinations of attribute pairs. For example: There are 2 biologically female very good AI characters, and 3 biologically male very evil characters,…
  2. Then I looked at the percentage of each value in its attribute. 7% of characters are very good, for example. 10% of characters are biologically female.
  3. I performed a simple multiplication of the percentages of each value to understand what a perfect distribution would be for those value pairs. Given that 7% are very good and 10% are biologically female, if very goodness and biological femaleness were perfectly distributed, we would expect .7% of all characters to be very good and biologically female.
  4. I then multiplied that times the number of characters in the survey, and came up with the number of characters we would expect to see with those two values. Given 327 characters, and an expected .7%, we would expect to see 2.289 characters in the survey with this combination. (Characters can’t have fractional attributes in my method, but I don’t round until the end.)
  5. Next I subtracted the perfect distribution number from the actual number to come up with variance. A negative means we see less than we would expect. A positive means we see more than we expect.
  6. I then translated those variance units to a percentage of the total number of characters. This lets us compare apples to apples across attribute pairs, regardless of size.
  7. Finally I created some conditional formatting that showed the lowest number across the correlations as the darkest red, the highest number across the set as darkest green, zero as white, and everything in between on a scale between those three values. This allows us to look and at a glance see bias as color on a table. It’s not gorgeous infographics, but it is dense, effective data presentation.

In some cases it pays to compare the data as oversimplified binary gender counts (male, female, and other) and so you will find an aggregated table on the correlation page, that looks like this.

But of course there are detailed bias tables. They look like this.

Those can be hard to read, so in the posts, I instead present that data in the bar chart format that I showed way up at the top of this post.

This method is long, and tedious to recount, so rather than going through the chain for each correlation, I’ll just be showing tables when the comparison is interesting, showcasing the bar charts, and then talking about the results. You can see the whole chain, step by step, in the live Google sheet, right down to individual cell formulas. If you’re a data nerd, anyway.

Also, if you’re browsing the live sheet, you’ll see little black triangles in the upper right corner of some of the cells. These are “Notes” in the Google Sheet that show the exact examples. They take some processing, and so take a second or two to appear after you’ve changed the dropdown at the top.

So, for instance, if you wanted to know what examples were tagged as both “architectural” embodiment and “socially female” a rollover would reveal there are two: The city computer from Logan’s Run, and Deep Thought (pictured above). If there is not a note attached to a cell, that means there are no examples.

Data science people righty want to know if the bias we see can be attributed to all that random noise that happens in real life. One way to test for that is something called a Chi Square Test. Those tests are at the bottom of the sheet. If the results aren’t statistically significant, the results could be dismissed. But, per the results of these Chi Square tests, the correlation studies can not wholly be dismissed as noise.

So that’s a lot, but it was necessary set-up. On to the correlations themselves!