
I watched Blade Runner for the first time last weekend. (The movie came out before I was born, so I feel like I should get a little bit of a pass on not having seen it).
The film plays with some of the themes we’ve seen in more recent sci fi, AI, and robot movies. I’m thinking specifically of Ex Machina. If forced to bin both films in the same category, I’d say these are two movies that take seriously the question, what if robots did have emotions?
One of the things I appreciated about Blade Runner is that there are scenes in which the bad robots—the “Nexus-6 Replicants”—seem literally overwhelmed by their emotions. (Though, I’ll admit, my interpretation might be heavily influenced of the phase of parenting I’m in).
The creator of this robotic world, Dr. Eldon Tyrell, had enough foresight to recognize that the replicants would behave even more like humans if they were given memories. So he gave them all memories.
What he (on my interpretation) failed to realize is that, in addition to a lifetime’s worth of memories, adult humans also have a lifetime’s worth of experience learning to manage their emotions—learning to feel without becoming crushed by those feelings.
That is, perhaps, the most relatable thing about the Nexus 6 Replicants. Their murder spree is, you know, less relatable.
It is worth noting that this imagined dystopia that Ridley Scott gave us back in 1982 was set in—wait for it—two thousand and nineteen. Needless to say, robotics technology has not developed at the rate that sci fi writers, directors, and even serious technology prognosticators of the early 1980s anticipated.
We have Roombas and Teslas, but we don’t seem that close to Terminators and Replicants. Despite this limitation in hardware—even the most humanistic robots of 2025 don’t look as human as the Replicants—the rapid development in software does have us asking questions about machines and emotions.
In 2022, an engineer at Google made headlines, and shortly thereafter, lost his job, when he publicly announced that Google’s chatbot, LaMDA, was sentient. Senior engineer, Blake Lemoine, had been assigned to test Google’s Language Model for Dialogue Applications (hence, “LaMDA”). Through the course of his work, Lemoine came to believe that LaMDA could think, and even feel.
There is a question lurking here as we compare the current state of affairs of AI with the sci fi movies of the mid- to late 20th century. Large language models really have actualized much of the aesthetics of sci fi software. Like Captain Kirk in Star Trek, we can speak to the computer and it’ll speak back. Like HAL in 2001, our AI systems can get the wrong answers for reasons that are completely opaque to us.
But we still haven’t gotten the robot revolution that George Lucas and Ridley Scott promised us. I think the most important reason for that is that it’s hard to provide robots with context and context is so important for the successful completion of tasks in the physical world.
When my daughter was just three years old, I remember once asking her to help me straighten up the apartment. She had several things—a backpack and a couple of toys and crayons—in the family room. Sequentially, I had asked her to put each item in her room. At the time, I was so impressed with the fact that my three-year-old was helping to clean the place up. Later, I walked past her room. Each object that I had asked her to take to her room was barely, I mean barely, past the threshold to her bedroom. She had helped make the family room look neater, but she didn’t really help clean up much. Each of those things still had to get put away. And yet, my daughter did exactly what I asked her to do. I asked her to put the things in her room and she put those things in her room. Even though she did what I said to do, she didn’t do what I wanted her to do. She didn’t understand the context that relates the specific words I used to my intent, which was to clean up the apartment.
My daughter is 13 now. She understands the context surrounding those words far better now than she did when she was three. Now, when I ask her to take things to her room, she knows that what I mean is that I’d like her to take her things into her room and to put them away. One important difference between my daughter at 3 and my daughter at 13 is her ability to understand the surrounding context and to enable that context to inform her understanding of language. In general, an understanding of the surrounding context shapes the meaning of the words. Understanding context is something humans do naturally, but it’s a hard problem for AI.
R2D2 from the Star Wars franchises might have video sensors to perceive the physical world and computer vision algorithms to do object detection and classification. He might have natural language processing algorithms that allow him to interpret human instructions (in any of several languages). He might have Roomba-like geolocation algorithms that allow him to navigate rooms and hallways and avoid stairs (I don’t think I’ve ever seen R2D2 take on stairs). But even if all these discrete AI tools are integrated into R2D2’s operating system, how does R2D2 come to have the background, history, and understanding of the world around him to situate human instructions in the proper context effectively? The question we are faced with today is whether large language models will enable robots to understand that context.
There was a moment back in 2023 that I thought robots might be on the verge of having R2D2-like context. New York Times tech reporter, Kevin Roose, published an article describing his sneak peek at Google’s latest robot innovation, RT-2. He opens the article,
A one-armed robot stood in front of a table. On the table sat three plastic figurines: a lion, a whale, and a dinosaur.
An engineer gave the robot an instruction: ‘Pick up the extinct animal.’[1]
Think about how much context is loaded into the engineer’s instructions. What does it mean to “pick something up?” What is an animal? What does it mean to be extinct? Are we talking about real animals or images of animals or, as in this case, figurines? What’s missing from the instructions is context. If you or I were sitting at the table looking at the lion, whale, and dinosaur figurines, we would be able to infer all the context we need. This is an easy problem for us to solve because context is easy for us.
Roose describes what happened next: “The robot whirred for a moment, then its arm extended and its claw opened and descended. It grabbed the dinosaur.”
When I first read that article, I thought we might be on the verge of a breakthrough in robotics. Perhaps the context robots need—and that engineers have, for decades, tried to provide through complex arrays of physical multi-modal sensors—can be provided instead through a network of generative AI models. The machine can interpret human instructions in ordinary human languages. Based on the large language model’s training data, the machine can make inferences about which animals are extinct and about how references to “picking up the animal” can be associated with “picking up the toy animal.”
But this application of LLMs to robotics suffers from the same limitation as every other application of LLMs. What happens when they get the answers wrong?
Roose describes this phenomenon with the Google robot, too.
The robot wasn’t perfect. It incorrectly identified the flavor of a can of LaCroix placed on the table in front of it. (The can was lemon; RT-2 guessed orange.) Another time, when it was asked what kind of fruit was on a table, the robot simply answered, “White.” (It was a banana.)
So far, after all the reading and thinking (and writing) I’ve done about large language models, the position I always fall back to is this: The degree of autonomy you give an LLM-enabled system should be inversely correlated with the consequence for failure. High consequence for failure? Low autonomy. Low consequence for failure? High autonomy.
If you have a low-consequence for failure task, say, brainstorming ideas for a new marketing strategy, then use LLMs all day long and feel free to give them as much autonomy as you like. Because we’re only talking about the brainstorming phase (and not, say, the very expensive and irrevocable implementation of a marketing strategy), this is a low-consequence task.
If you have a high-consequence task, say, planning for a major military operation, you can still use LLMs, but you wouldn’t want to give them any autonomy. You’d want a human to review every option the LLM provided for accuracy and reasonability.
The thing about robots is that, because they interact with the physical world, they tend to be fairly high-consequence proposals. The first human being ever killed by a robot wasn’t Sarah or John Connor and the robot wasn’t a humanoid with laser guns and hatred in its heart.
In 1979 at a Ford Motor Company casting facility, a one-ton robot handled parts retrieval. One day, the machine was moving slowly and 25-year-old Robert Williams grew impatient. He climbed up the storage rack to retrieve parts manually, but the robot continued as if completely unaware of Williams’s presence. The robot crushed Williams’s head, killing him instantly.
Releasing autonomou robots into the world is a dangerous proposition. And ensuring they behave in exactly the ways we want and expect is hard. As hallucinations establish themselves as a permanent bug in LLMs, I’m not sure we’re ever going to see LLM-enabled autonomous robots operating around us in the real world. And that’s because we’re all much more likely to be killed by a one-ton parts retrieval robot than by a Nexus-6 Replicant.
Credit where it’s due
Views Expressed are those of the author and do not necessarily reflect those of the US Air Force, the Department of Defense, or any part of the US Government.