Voice computing is rapidly changing the way people interact with technology, which will have dramatic impact on how learners expect to interact with eLearning technology.

According to journalist and author James Vlahos, “The advent of voice computing is a watershed moment in human history because using words is the defining trait of our species.”

In his recent book, Talk to Me, Vlahos describes the technological advances that are enabling progress toward artificial intelligence agents that can converse almost naturally with humans. These include exponential increases in computing power, advances in machine learning, and the immense power that cloud computing makes available to small mobile devices.

These advances, Vlahos wrote, are “ushering in what’s known as ‘ambient computing,’”—ultimately pointing toward an era where physical devices might fade into the background: “With voice, computers are to be ubiquitous rather than discrete, invisible rather than embodied.”

Two approaches to natural conversation

Though people have endeavored to teach computers to converse naturally with humans for decades, dialogue turns out to be surprisingly complex. In an eLearning Guild research report, Jane Bozath illustrated the complexity of a “simple” task—training an AI conversational agent to take a pizza order. What Bozarth describes as “density” is the large number “of information and variations on questions and responses” that the agent would need to “understand” to provide a “smooth” conversational experience to a customer.

Innovators have taken two main approaches to “teaching” their AI agents to converse with humans:

  • A rules-based approach based on neural networks that use thousands—or millions—of rules to decipher speech, convert it to words, use context-based rules to “understand” the words and sentences, and respond appropriately
  • A machine-learning approach that uses deep learning to both “understand” requests and generate replies

Rules-based conversation

Conversation-guiding rules touch on everything from parts of speech and grammar to the various meanings of a word. Templates for various topics and conversational flows provide structure, and the bot or AI agent can fill in factual information it pulls from a database or looks up online.

Cloud computing offers access to vast amounts of information. When someone asks Alexa or Siri the weather, they want a current response, not something written weeks or months ago. But the outlines of the response, “Today in Missoula it will be …” can be scripted. The agent simply looks up a current forecast and fills in temperatures and conditions like overcast or snowing.

A generalist, like Siri or Alexa, might be asked anything on any topic, which makes planning for and devising rules for the conversation challenging. Some developers try to anticipate a vast range of conversational prompts and create rules for appropriate responses.

Others turn to machine learning.

Machine learning and conversation

Some teams competing for the “Alexa Prize,” a competition to develop ever-better conversational AI “socialbots,” use vast databases of conversations, such as movie dialogues or transcripts of customer service calls, to “train” the AI.

Using machine learning, the bots learn patterns and generate responses. But conversations are not predictable, and pattern matching is insufficient to cover all of the variability, topic changes, slang, and other elements that can confuse a bot.

A blended approach

Each approach on its own has serious limitations, and many developers, including teams competing for the Alexa Prize, often use elements of both. The ultimate goal is to develop AI technology that allows consumers to have “novel, engaging conversations” with their Alexa voice assistants.

eLearning offers a natural venue for voice computing

Many of the difficulties developers encounter when creating “socialbots” for general conversation are less problematic in a more constrained environment. Using conversational AI in an eLearning context, where learners are focused on a single topic or narrow range of content, could become feasible before developers achieve the goal of an AI that excels at general conversation.