Constantinos Boulis

RESEARCH STATEMENT

My research interests lie in the intersection of three fields; automatic speech recognition (ASR), natural language processing (NLP) and data mining. All these three fields come together into Spoken Language Understanding (SLU) systems, i.e. systems that go beyond the surface recognition and interpretation of spoken language and therefore have the potential to revolutionize human-computer and human-human interactions. The promise of SLU is to deliver natural and efficient human-computer spoken interaction and to augment human-human spoken communication. After the mid 90’s the core technologies of ASR and NLP have reached a certain level of maturity that allowed for basic commercial and research SLU systems to be developed. Examples include the DARPA Communicator project, an attempt to build a dialogue system with the goal of travel reservation over the phone, and AT&T’s “How May I Help You?”© project, a call-routing application where customers describe naturally the questions/problems they are facing and are automatically routed to the relevant department. These two examples help demonstrate that a wide variety of applications with different research requirements are encompassed under the general term of SLU. SLU is in its early stages and there are a number of different promising avenues to be explored. My research on SLU will follow two main directions.

Reducing the cost of deploying SLU systems

Collecting, annotating and analyzing training data are laborious, time-intensive and expensive steps that currently amount to the majority of the costs associated with deploying an SLU system. Moreover, new data have to be collected for new tasks. Therefore, methods that do not significantly compromise performance with less annotated data are extremely attractive. Also, reusing data from different domains or designing systems that are portable across different operating conditions and tasks are equally important. In addition, since it is usually easy to obtain large amounts of unannotated data, methodologies to utilize the unannotated data or even train an SLU system without any annotated data are crucial. Research on semi-supervised learning for SLU has recently emerged and initial results show the promise of such attempts.

SLU on human-human communication

The main focus of SLU has been the human-computer interaction. On the other hand, human-human communication has not received the same amount of attention despite the fact that it is occurring naturally and ubiquitously in environments such as business meetings and customer call-centers. The objective of SLU on human-human communication is to augment rather than replace the interaction, therefore it is very different from SLU in human-computer interaction. SLU can be used to extract the topics from business meetings, through topic segmentation, clustering and characterization, which can in turn be used to retrieve relevant portions of meetings or produce summaries of meetings. In customer call-centers, SLU can be used to assess the degree to which the customer was satisfied with their interaction. Detecting frustration or satisfaction can be an important component of such systems. Another interesting application of SLU on human-human communication is the 311 line. The 311 line is a phone line for city residents to make their non-urgent requests/comments for city services. People place a phone call and converse with a city employee about the information they want to relay. A recent February 7th article on the TIME magazine described the 311 line as “a way of harnessing the collective needs of an entire city” and cited numerous cases where the 311 line has helped uncover knowledge previously unattained by other means. The objective of such a system is to mine the vast amounts of speech data daily collected (in New York city only there are 41,000 calls to 311 every day) for useful and actionable knowledge. The 311 line is a prime target for SLU to augment human-human communication. While a conversation is happening, the system can suggest new topics or questions for the city employee to ask, that are deemed to be relevant based on similar conversations.