Me: “Should I go to bed, Siri?”
Siri: “I think you should sleep on it.”
It’s hard not to admire a smart-aleck reply like that. Siri—the “intelligent personal assistant” built into Apple’s iPhone 4S—often displays this kind of attitude, especially when asked a question that pokes fun at its artificial intelligence. But the answer is not some snarky programmers’ joke. It’s a crucial part of why Siri works so well.
The popularity of Siri shows that a digital assistant needs more than just intelligence to succeed; it also needs tact, charm, and surprisingly, wit. Errors cause frustration and annoyance with any computer interface. The risk is amplified dramatically with one that poses as a conversational personal assistant, a fact that has undone some socially stunted virtual assistants in the past. So for Siri, being likable and occasionally kooky may be just as important as dazzling with feats of machine intelligence.
Siri has its origins in a research project begun in 2003 and funded by the U.S. military’s Defense Advanced Research Projects Agency (DARPA). The effort was led by SRI International, which in 2007 spun off a company that released the original version of Siri as an iPhone app in February 2010 (the technology was named among Technology Review’s 10 Emerging Technologies in 2009). This earlier Siri could do fewer things than the one that later came built into the iPhone 4S. It was able to access a handful of online services for making restaurant reservations, buying movie tickets, and booking taxis, but it was error-prone and never made a big hit with users. Apple bought the startup behind Siri for an undisclosed sum just two months after the app made its debut.
The Siri that appeared a year and a half later works astonishingly well. It listens to spoken commands (in English, French, German, and Japanese) and responds with either an appropriate action or an answer spoken in a calm, suitably robotic female voice. Ask Siri to wake you up at 8:00 a.m. and it will set the phone’s alarm clock accordingly. Tell Siri to send a text message to a friend and it will dutifully take dictation before firing off your missive. Say “Where can I find a burrito, Siri?” and Siri will serve up a list of well-reviewed nearby Mexican restaurants, found by querying the phone’s location sensor and performing a Web and map search. Siri also has countless facts and figures at its fingertips, thanks to the online “answer engine” Wolfram Alpha, which has access to many databases. Ask “What’s the radius of Jupiter?” and Siri will casually inform you that it’s 42,982 miles.
Siri’s charismatic quality is entirely lacking in other natural-language interfaces. Several companies sell virtual customer service agents capable of chatting with customers online in typed text. One example is Eva, created by the Spanish company Indysis. Eva can chat comfortably unless the conversation begins to stray from the areas it’s been trained to talk about. If it does, then Eva will rather rudely attempt to push you back toward those topics.
Siri also has some closer competitors in the form of apps available for iPhones and Android devices. Evi, made by True Knowledge; Dragon Go, from the voice-recognition company Nuance; and Iris, made by the Indian software company Dexetra, are all variations on the theme of a voice-controlled personal assistant, and they can often match Siri’s ability to understand and carry out simple tasks, or to retrieve information. But they are much less socially adept. When I asked Iris if it thought I should go to sleep, “Perhaps you could use the rest” was its flat, humorless response.
Impressive though Siri is, however, the AI involved is not all that sophisticated. Boris Katz, a principal research scientist at MIT’s Computer Science and Artificial Intelligence Lab, who’s been building machines that parse human language for decades, suspects that Siri doesn’t put much effort into analyzing what a person is asking. Instead of figuring out how the words in a sentence work together to convey meaning, he believes, Siri often just recognizes a few keywords and matches them with a limited number of preprogrammed responses. “They taught it a few things, and the system expects those things,” he says. “They’re very clever about what people normally ask.”
In contrast, conventional artificial-intelligence research has strived to parse more complex meaning in conversations. In 1985, Katz began building a system called START to answer questions by processing sentence structure. That system answers typed questions by analyzing how the words are arranged, to interpret the meaning of what’s being asked. This enables START to answer questions phrased in complex ways or with some degree of ambiguity.
In 2006—a year before SRI spun off its startup—Katz and colleagues demonstrated a software assistant based on START that could be accessed by typing queries into a mobile phone. The concept is remarkably similar to Siri, but this part of the START project never progressed any further. It remained less important than Katz’s pursuit of his real objective—to create a machine that can better match the human ability to use language.
START is just a tiny offshoot of the research into artificial intelligence that began some 50 years earlier as an attempt to understand the functioning of the human mind and to create something analogous in machines. That effort has produced many truly remarkable technologies, capable of performing computational tasks that are impossibly complicated for humans. But artificial-intelligence research has failed to re-create many aspects of human intellect, including language and communication. As Katz explains, a simple conversation between two people can tap into the full depth of a person’s life experiences, and this remains impossible to mimic in a machine. So even as AI systems have become better at accessing, processing, and presenting information, human communication has continued to elude them.
Despite being less capable than START at dealing with the complexities of language, Siri shows that a machine can pull off just enough tricks to fool users into feeling as if they’re having something approximately like a real conversation. To understand how difficult it is to get even simple text-based communication right, you need look no further than the infamous intelligent assistant introduced by Microsoft back in 1997. This annoying virtual paper clip, called Clippy, would pop up whenever a user created a document, offering assistance with a message such as the infuriating line “It looks like you’re writing a letter. Would you like help?” Microsoft expected users to love Clippy. Bill Gates thought fans would design Clippy T-shirts, mugs, and websites. So the company was stunned, and confused, when users hated Clippy, creating T-shirts, mugs, and websites dedicated to disparaging it. The response was so bad that Microsoft killed Clippy off in 2007.
Before it did, Microsoft hired Stanford professor Clifford Nass, an expert on human-computer interaction, to investigate why the program had inspired so much unpleasantness. Nass, who is the author of The Man Who Lied to His Laptop: What Machines Teach Us about Human Relationships, has spent years studying similar phenomena, and his work suggests a fairly simple cause: people instinctively apply the rules of human social interactions to dealings with computers, cell phones, robots, in-car navigation systems, and similar machines. Nass realized that Clippy broke just about every norm of acceptable social behavior. It made the same mistakes again and again, and constantly pestered users who wanted to be left alone. “Clippy’s problem was it said ‘I’ll do everything’ and then proceeded to disappoint,” says Nass. Just as a person who repeats the same answer again and again makes us feel insulted, Nass says, so does a computer interface—even if we know full well we’re dealing with a machine.
Clippy showed that attempting more humanlike communication can backfire spectacularly if the subtleties of social behavior aren’t understood and respected. Nass says Apple did everything possible to make Siri likable. Siri doesn’t impose itself on the user at all. The application runs in the background on the iPhone, leaping to attention only when the user holds down the “home” button or puts the phone to his or her ear and starts speaking. It also avoids making the same mistake twice, trying different answers when the user repeats a question. Even the tone of Siri’s voice was carefully chosen to be inoffensive, Nass believes.
Apple also limited the tasks Siri can perform and the answers it can give, most probably to avoid disappointment. If you ask Siri to post something to Twitter, for example, it’ll sheepishly admit that it doesn’t know how. But since the alternative could be accidentally broadcasting garbled tweets, this strategy is understandable.
The accuracy of Siri’s voice recognition also helps avoid disappointment. The system does sometimes mishear words, often with amusing results. “I’m sorry, Will, I don’t understand ‘I need pajamas’ ” was a curious response to a question that had nothing to do with pajamas. But mostly the voice system works remarkably well. It has no problem with my English accent or with many complex words and phrases, and this overall accuracy makes the odd mistake that much more acceptable.
A key challenge for Apple was that soon after meeting Siri, a person may experience a powerful urge to trip up this virtual know-it-all: to ask it the meaning of life, whether it believes in God, or whether it knows R2D2. Apple chose to handle this phenomenon in an inventive way: by making sure Siri gets the joke and plays along. Thus it has a clever answer for just about any curveball thrown at it and even varies its responses, a trick that makes it seem eerily human at times.
This banter also helps lessen the blow when Siri misunderstands something or is stumped by a surprisingly simple question. Once, when I asked who won the Super Bowl, it proudly converted one Korean won into dollars for me. I knew this was just an algorithmic error in a distant bank of computer servers, but I also felt the urge to interpret it as Siri being zany.
Nass says the way Siri handles humor is inspired. Research has revealed, he notes, that humor makes people seem smarter and more likable. “Intermittent, innocent humor has been shown, for both people and computers, to be effective,” Nass says. “It’s very positive, even for the most boring, staid computer interface.”
But Katz, as someone who has been striving for decades to give machines the ability to use language, hopes eventually to see something much more sophisticated than Siri emerge: a machine capable of holding real conversations with people. Such machines could provide fundamental insights into the nature of human intelligence, he says, and they might provide a more natural way to teach machines how to be smarter.
That might continue to be the dream of AI researchers. For the rest of us, though, the arrival of a virtual assistant that is actually useful is just as fundamental a breakthrough. In Katz’s office at MIT, I showed him some of the amusing answers Siri comes up with when provoked. He chuckled and remarked at the cleverness of the engineers who designed Siri, but he also spoke as an AI researcher using meanings and words that Siri would undoubtedly struggle with. “There’s nothing wrong with having gimmicks,” he said, “but it would be nice if it could actually analyze deeply what you said. The conversations with the user will be that much richer.”
Katz is right that a more revolutionary intelligent personal assistant—one that’s capable of performing many more complicated tasks—will need more advanced AI. But this also underplays an important innovation behind Siri. After testing the app a while longer, Katz confessed that he admires entrepreneurs who know how to turn advances in computer science into something that ordinary people will use every day. “I wish I knew how people do that,” he admits.
For the answer, perhaps he just needs to keep talking to Siri.
Will Knight is Technology Review ’s online editor.