How Deep Reinforcement Learning can help chatbots

(invited by Venturebeat and IEEE, to appear July 2016;

)

Li Deng, IEEE Fellow and Chief Scientist of AI

Microsoft Research & ASG, Redmond

In March this year, Satya Nadella -- the CEO of Microsoft -- talked about the industry trend of using human language more pervasively for interaction with computing devices, a trend he called “conversation as a platform.”

He also announced several bot initiatives, including the company’s bot framework. In April, Facebook launched its Messenger platform with bots. Then, in May, Google announced its attempt to develop AI-powered bots, called Google Assistant. Since then, bots have been widely regarded as a new user interface (UI) to fundamentally change how computing will be experienced by people.

What is wrong with Apps and Web models?

The App platform was invented by Apple for smartphones many years ago, followed by Google in its Google Play Store. The app platform is based on a uniform resource model (a model that suggests a phone with a certain amount of memory and processing speed can provide everything you need), but the downside is the large array of apps that now clutter the screens of mobile phone users, even though they consistently use fewer than 20 or so apps.

In fact, the number of actively used smartphone apps is actually decreasing. And in aggregate, even though millions of apps have been written and distributed, yet many have never been used. This undoubtedly represents a substantial waste of both the device’s resources and the user’s time in downloading, installing and managing apps.

The Web model is even worse for mobile UI. The usage of web services through mobile access is low. The usage rates stem from the fact that many popular websites are designed and optimized for non-mobile PCs, which are typically equipped with broadband access.

The limited bandwidth and computing resources of a smartphone are serious impediments for many web services. Moreover, most websites follow the conventional paradigm of page-centric disposition of information, resulting in the (non-mobile) browser being implicitly modeled after a book reader, which is suboptimal for mobile devices.

The result: We need to redesign the mobile phone UI from the ground up in order to realize the full potential of the mobile era.

Conversation as an emerging paradigm for mobile UI

There is, fortunately, a new emerging paradigm in this redesign and re-implementation that scales well for mobile phones. This new conversation as a platform paradigm enables mobile users to discover, access, and interact with information and services that matter to their day-to-day lives, where useful information and services can be naturally integrated into the conversation streams.

The conversational UI paradigm will result in the formation of a new ecosystem capable of having a greater scale advantage than the previous Web and Apps eco-systems. This is possible because the information industry is entering a new era of digitizing the physical world and linking those connections tophysical services. This physical, interactive, and service-centric world is way beyond the previous static web-information disposition era.

Messaging is core to the new conversational paradigm that comprises a stream of short text, audio, and video messages. Due to its both asynchronous and yet near real-time nature, messaging becomes an accelerator, driving the growth of digital conversation. Users no longer need to pay the coordination cost of a pre-scheduled time of interaction, yet retain the ability to have near real-time conversations.

Bots as intelligent conversational interface agents

More essential to the conversation-centric mobile UI paradigm are various types of intelligent services afforded in conversational interactions. We have intelligent and personal assistants like Siri, Google Now, Cortana, and Alexa. We also have individual bots that can be accessed by an automated conversational interface.

The endgame of the emerging conversational paradigm: There will likely be no app for the user to download, and the necessary service resources, likely residing in the cloud, will be provided automatically as the AI bots intelligently monitor and respond through the messaging UI, utilizing speech and natural language capabilities.

A.I. bots are now possible due to the recent huge advances in machine learning and A.I. These advances enable us to provide more and more automation for things we care about. The rise of deep learning in the past several years, particularly deep reinforcement learning (RL)in the past 1.5 years, makes effective use of the increasing amount of data and computing resources, boosting our ability to build computational models for the world environment and for any application domains relevant to our lives.

In machine learning, RL has distinct characteristics. You need to get feedback from the user. There also must be a reward for the user. It is similar to chess where you might not know the reward as you play but you will understand it eventually. You know the interactions with the A.I. will lead to an outcome of tasks being completed. You know the goal is to reserve a flight, but the bot only says things that are meant to achieve the goal even if the user might not understand the bot is working on the ultimate goal and reward.

These advances make automated speech and natural language understanding within reach, eventually allowing us to solve conversational understanding and dialogue problems over many domains. The A.I. bots built with deep RL will understand the semantics of all domains and be capable of scaling across domains not yet possible as of today.

A.I. bots will take iterations and feedback loops to develop and to perfect. The environment models built into the RL component of A.I. bots will be able to detect, acquire, create, and grow new knowledge automatically and gracefully, allowing us to develop more and more intelligent services and experiences including, in particular, booking, payment, and other action-oriented services.

Three types of A.I. bots

Broadly speaking, there are three types of A.I. bots. The first type is to seek information so the goal is clear. The second type is seeking information, but the goal is not immediately clear. You might ask for the open hours of a movie theater. That's not the limit of your goal, but a step on the ultimate goal of seeing a movie. In the first one, the reward is well-defined. In the second, it is also fairly clear (or it will be).

When built with the powerful A.I. technology of deep RL, two of them have their reward functions (a key component of RL) rather straightforwardly defined. These are bots that seek information and that aim to complete specific tasks such as booking an airline ticket and reserving a hotel room.

The third type of A.I. bot in need of the most guidance is social bots, sometimes also called chatbots or chitchat bots. The reward function of this type of bot (used in deep RL algorithms) -- loosely connoted as “emotional intelligence” -- cannot be easily quantified. One example would be asking for advice or something vague asking about ideas for what to do today.

To provide a mathematical foundation for a broadened capability that can handle the potentially complex reward functions for social bots, the research community as well as practitioners need an in-depth investigation. The goal here is to extend the commonly used RL algorithms (e.g. those used as a key learning method in AlphaGo) to better algorithms that exploit information-theoretic and intrinsically motivated rewards.

Such rewards would capture aspects of emotional satisfaction by users in bot-user conversations before switching to other types of bot conversations that try to get tasks complete. This is a particularly fertile A.I. research area for computer scientists and electrical engineers alike.