An approach to separate the task-specific information from the source code in Galaxy Architecture based Conversational Systems
Daniel Pérez, M.C., Ingrid Kirschning, PhD.
TLATOA Speech Processing Group, CENTIA,
Universidad de las Américas – Puebla, México.
Abstract: - There are several conversational systems around the world, but in the majority of them the task specific information is embedded in the source code, so any change to the task is complicated and cumbersome, requiring good programming skills. We think that it should be possible to offer a system where the whole task specific information is defined in text files totally separated from the source code. We are proposing thus a Galaxy-based conversational system approach with a new task-specific-information server and task-files in order to make this new server interact directly with the system components and letting the developer to worry only about the task definition itself. We expect that the creation of a new conversational system with a different task will be easier and faster than the way it should be with the current systems.
Key-Words: - Galaxy architecture, Conversational Systems, task specific information, domain independence.
1. Introduction
A conversational system [1] is a set of programs that work together using speech technologies with the main purpose to help a user in the achievement of a goal using speech as the interaction mode. Usually, these systems work via the telephone line as information systems on a restricted domain. The user is able to interact with the system using natural speech dialogue to achieve the completion of a specific task.
The main components of the typical logical structure of a conversational system are: Automatic Speech Recognizer (ASR), Natural Language Understanding (NLU), Dialogue Manager (DM), Natural Language Generator (NLG) and Text to Speech synthesizer (TTS). The ASR transforms the audio signal into a text stream, and the NLU module (also known as parser) interprets the recognized words. The DM decides what actions to carry out and when. It’s responsible for the flow control of the conversation and for making the connections with the final application and the data source such as databases or the Internet. The NLG module converts “keywords” or results from a database query into sentences that can be understood by a human, most of the times using templates. And the TTS is the module that converts the text generated by the NLG into a sound signal.
The parts related to the task specific information are usually embedded within all the components of the conversational system because the task is the main reason of the system. Thus if a person without good programming skills and thorough knowledge about the source code of the system wants to create a new conversational system, it will be very difficult. We believe that the whole task-related information should be placed in text files apart from the source code. These files have to contain all the information about the task, the fields contained in the database and the related data types, the pieces of information that the user may ask for, relations between them, restrictions between the values, mappings to computational forms (canonical forms) etc.
2. Knowledge representation in conversational systems
Two important issues on conversational systems are: the way they represent the task specific information, i.e., the knowledge representation, and the way they use this information to manage the dialog in an efficient way. There are several approaches for knowledge representation like the first order predicate calculus, semantic networks, conceptual dependency diagrams and frame-based representations [2].
The Galaxy Architecture is a distributed, message based, client-server infrastructure to create conversational systems. The components are built as modules (servers) that communicate with each other via a hub or central module [3]. Two of the most advanced non-commercial conversational systems are based on Galaxy architecture: The CU Communicator [4], and the Mercury air travel agent [6]. There are some other systems like the TRIPS project [7], some approaches in the AT&T labs [8], and one more in the University of Beijing [9]. Most of these systems work in the air travel reservation domain.
2.1 The CU Communicator system
The CU Communicator is a conversational system that assists people planning travels, helping them to reserve flight tickets, book hotel rooms and car rental [4]. The DM in this system has an implicit domain, that is, a great part of the task-related information is in the source code in programs written in C, and it is complicated to adapt it to a new domain. Text files containing the grammars, the task file (where the specific task is defined) and some lines of the source code have to be edited or rewritten, and the source code has to be recompiled. Also, to change the domain the grammar files for the NLU and also the NLG have to be modified.
In the CU Communicator, the task knowledge is represented in structures called frames and it’s specified in some independent text files and within the source code. When a frame is complete, the DM can launch a query to the database via a database server. The CU Communicator is an event driven system, where the decision on what to do next is decided by the current state of the interaction and not by a script or a transition state network [5].
2.2 MIT Mercury
Mercury is a telephone-based conversational system that provides information about flight schedules and pricing and enables users to book complex multi-leg travel itineraries to over 200 cities within the US and around the world. This conversational system manipulates linguistic and world knowledge represented in the form of semantic frames. Mercury manages the dialog making use of an ordered set of rules. There are more than 200 rules involved with different aspects like prompting for missing information, logging into the system, apologizing for missing services, attending requests for help or repetition, interpreting references to relative dates and times, preparing the reply frame after the database query has been done, preparing replies, etc. [6].
2.3 TRIPS
The Rochester Interactive Planning System (TRIPS) is the latest in a series of prototype collaborative planning assistants. Its logical architecture is comprised of a complex network of agents, which has a well-defined linguistic or planning role. These component agents communicate with each other using a central module called the TRIPS facilitator whose main job is to route messages to its intended receiver module.
TRIPS components can be divided into three areas of functionality: interpretation, generation, and behavior, which are directed by the Interpretation Manager, Generation Manager, and Behavioral Agent, respectively [7].
2.4 AT&T Laboratories
The AT&T labs are developing a system based on a DM with three major contributions: The task knowledge representation, a Construct Algebra and a collection of dialog motivators. The task knowledge representation exploits object-oriented paradigms. The dialog motivators provide the DM with the dialog strategies that govern its behavior. The construct Algebra provides the building blocks needed to create new dialog motivators and analyze them. They say that with this new way of defining the task; the DM will be more flexible and easily adaptable to new domains. The information about the task in these systems is defined using an object inheritance hierarchy which defines the relationships that exist amongst the task knowledge. The DM uses the inheritance hierarchy and an algorithm to produce a set of semantically consistent inputs to be used by the dialog manager [8].
2.5 Topic forest
Researchers at the Center of Spoken Technology, University of Beijing China, proposed a plan-based DM structure. They defined a tree-like structure called “the topic tree” to represent topics of the dialog task, making a hierarchical definition of the components, stating relations between them, making it a good approach to separate the knowledge from the code in this system. One implementation of this system is the Easy Flight system, a flight reservation system. The reasoning engine based on the Topic Forest is domain independent and it achieves mixed initiative dialog control [9].
As mentioned above, trying to specify a whole new task in a conversational system is a very compolicated task, and we think that there should be a way to define all the task specific information totally apart from the source code. The CU Communicator system and the Mercury system use the Galaxy architecture, and we have direct access to the source code of the CU Communicator, thus we consider it a proper choice to work on this system.
In the next section we’ll describe how we intend to modify the CU Communicator task files, the servers, mainly the DM, and the creation of a new server, to make the interaction work the same way it does now, but defining the task independently from the source code.
3. A task-specific-information server
The CU Communicator task file is a text file where the task specific information is defined, but there still is a lot of information distributed within the source code. The information contained in this file is mainly prompts, SQL templates and directives, and some hierarchical relations between the task components. Our aim is to expand this task file to eliminate all the task-related information from the source code.
3.1 Task File
The new task file (see fig.1) will be an extension of the current task file used in the CU Communicator, and it will contain three parts:
1) A definition of the database fields from which the information is going to be retrieved, as well as the data types, the frames and required slots, relations between them, and a set of definitions of the minimal slots required to launch a database query.
2) A list of needed restrictions to verify the integrity of the task to ensure that a given value fits certain range, or special restrictions between values for sanity checks.
3) A section for the prompts containing information on how to ask for specific information on a missing field, or how to generate responses to the user, etc.
.
Fig. 1. The flow of information in the new conversational system with the new task-specific information server.
These three parts are related to the NLU, DM and NLG module respectively. The last part has to do mainly with the NLG module, but there will be some pre-constructed task-dependent phrases that will be stated in this new task file. To manage all this information stated on the task file, we will create a Task-Specific-Information server that will cope with all the linking between the system itself, and the task- specific files. This new server will be closely related to the DM. The dialogue interaction will be carried out the same way the CU Communicator does now. Currently there is a mechanism called the action switch that decides what action to do next, based on a priority list [5], so we’ll modify the current DM to interact with the new server. The goal here is to take some ideas from the systems mentioned in section 2, and implement all in one more flexible system.
3.2 Task-Information Server
The main parts of this server will be:
a) A simple parser that will determine if the written task file is consistent, i.e. that all the restrictions are consistent with the definitions of the slots, that the prompts are associated to specific valid slots or sets of slots, etc.
b) A frame creation engine that will analyze the first part of the task file and will create the frames and all the related data structures needed to carry out the interaction. The frames will be created before the interaction between the user and the system takes place, just after the parser verifies the integrity of the task file.
c) A sanity checker, i.e. a module that will be invoked by the DM and which will load the restrictions specified in the task file and will carry out a verification of valid data once the user has filled the slots required to launch a database query. In case that the data contained in the frames is inconsistent, or does not fit the restrictions, the DM will then try to solve this problem.
The flow of information in the new system can be seen in figure 1, where the thick arrows show the flow of the conversation between the modules and the thin ones represent the flow of information between the servers. The dotted lines indicate the parts that will not be implemented in the present project.
3.3 Writing a new Task file
In our system, we’ll work with a set of predefined tags that will enclose all the information related to a specific task. These tags are: [task], [definitions], [relations], [minimal_combinations],
[restrictions] and [prompts].
The [task] tag will only act as an identification name for a specific task; this will be the simplest tag of the file. Next, the [definitions] tag will contain a whole definition of the database fields of the database we will be consulting including fieldname and data type (see fig. 2). The tag [dbfields] marks the beginning and each new field begins with the tag [dbfieldXX], where XX is a sequential number to define separate fields.
[task] name_of_the_task
[definitions]
[dbfields]
[dbfield01] name: db_field_name01
type: db_field_type01
[dbfield02] name: db_field_name02
type: db_field …
[relations]
[concept1]