The Chinese Internet giant Baidu launched a conversational personal assistant service called Duer at a company event held in Beijing Tuesday. It is just the latest sign that we could soon forgo swiping and typing for chatting with our computers instead.
The assistant service is designed to provide quick and easy access to Baidu’s various Internet services and to engage in a dialogue with users rather than simply being voice-controlled. Duer (which means “Du secretary”) is bundled with the latest versions of Baidu’s apps for smartphones.
Duer’s success will depend on how well it can parse naturally spoken language. This is notoriously difficult, although researchers have been making significant progress in recent years in both speech recognition and, to a lesser degree, natural language processing thanks to a powerful machine-learning technique known as deep learning. Companies such as Facebook see natural language as a key challenge for mining information and communicating with users (see “Teaching Machines to Understand Us”).
According to Baidu, Duer will mine meaning from written information on the Web. Baidu will collect information about a restaurant, for example, and Duer will infer whether it is pet-friendly or has outdoor seating. In contrast, most voice apps simply tap into conventional search engines, which do not try to extract meaning from information online.
Andrew Ng, chief scientist for Baidu Research in Silicon Valley, and an expert in the field of deep learning, has said that recent advances will soon enable far more capable and smarter forms of voice control, and that this will enable a new age of computer interaction.
Other companies are also pushing aggressively into voice-mediated computing. With more users expected to turn to voice interaction, many tech companies hope to provide capable voice services in order to gain a competitive advantage, or at least to not fall behind their rivals.
The U.S. companies Apple, Google, and Microsoft all include voice-controlled assistants in their smartphone operating systems. And in November of last year, the U.S. e-commerce giant Amazon launched a device for the home called Echo that includes a voice persona called Alexa. At launch, the Echo could be used to look up information from the Web, play podcasts or music from a user’s Amazon library, and add items to a shopping list.
Amazon released an application programming interface for the Echo earlier this year, allowing developers to connect the device to outside apps or services, thus giving it new skills. It also announced $100 million in funding for startups working on voice services to connect them with the Echo.
Matt Lease, an associate professor at the University of Texas, Austin, who specializes in parsing language using computers, says voice interfaces are advancing thanks to fundamental progress in areas such as deep learning combined with the ubiquity of portable devices, which have made people more familiar with voice control. “I don’t think there’s a huge, fundamental breakthrough,” Lease says. “But I’m more comfortable talking to my phone and I’m more comfortable talking to this thing in my living room.”