Feedback Analysis for User Adaptive Statistical Translation

Annual Public Report - Project Year 1

Annual Public Report - Project Year 1

November 2010

Grant Agreement number: 247762

Project acronym: FAUST

Project title: Feedback Analysis for User Adaptive Statistical Translation

Funding Scheme: FP7-ICT-2009-4 STREP

Period covered: from 1 February 2010 to 15 November 2010

Project coordinator :

Dr William Byrne,

Reader in Information Engineering

Department of Engineering,

University of Cambridge, UK

Tel: +44 (0)1223 332651

Fax: +44 (0)1223 332662

E-mail:

Project website address:

This report summarizes progress in the first seven months of collaboration within the FAUST project.

Project Aim:

FAUST aims to develop fluent, web-based MT systems which respond to user feedback.

Web-based machine translation systems are now readily available in many of the world’s major and minor languages. The FAUST project was motivated by perceived shortcomings of a fundamental nature in the technological approaches to delivering machine translation to general populations of users. We elaborate here on the two key aspects of our project’s aim.

Feedback As translation technology is brought ever closer to its community of users, there is strong potential for creating collaborative interaction between translators, casual active users, and technology developers. For example, on the Reverso.net website ( which translates an average of 30 million text passages each month, users are invited to provide feedback and to suggest improvements to the automatic translation of any given sentence. Unfortunately this feedback cannot yet be exploited because:

•User feedback tends to be very noisy;

•No research published to date makes explicit how statistical translation systems can be adapted to benefit from feedback provided by web users;

•No mechanisms exist to identify user feedback of value and for immediately modifying a statistical MT system so that subsequent users do not run into the same problem.

‣FAUST aims to address these problems to ‘close the loop’ so that user feedback becomes part of the development and evaluation cycle for machine translation systems deployed in online translation.

Fluency Machine Translation can be disconcerting for the uninitiated. Automatic systems which make basic mistakes in grammar and word sense are perceived as unreliable and unpleasant to use. We take the view that MT systems must become fluent if they are to be accepted and trusted by large communities of users.

‣FAUST aims to improve user satisfaction with online MT by bringing natural language generation into statistical machine translation to improve MT fluency.

The project is organized to address the following technical objectives

  1. Enhance the high-volume, Reverso.net translation website with an experimental and evaluation infrastructure for the study of instantaneous user feedback in MT.
  2. Deploy novel web-oriented, feedback collection mechanisms that reduce noise in feedback provided by users and increase the utility of the web contributions.
  3. Automatically acquire data collections to study translation with user feedback.
  4. Develop mechanisms for instantaneously incorporating user feedback into the machine translation engines that are used in production environments, such as those that power the Reverso.net website.
  5. Create novel automatic metrics of translation quality which reflect preferences learned from user feedback.
  6. Develop new translation models driven by user feedback data and integrate natural language generation directly into MT to improve translation fluency and reduce negative feedback from users.

Although it is too early in the course of our work to have deployed machine translation systems for use by the public, as Objective 1 (above) makes clear, this is key goal for the project. FAUST will create two new interfaces to the Reverso.net translation services:

•labs.reverso.net : research MT systems will be deployed directly on so that researchers can observer users interacting directly with MT systems.

•forums.reverso.net : will provide a meeting place for translation users to interact with each other and to experiment with novel feedback collection mechanisms.

The Forum and Labs environments will support interaction between MT users and researchers

The project has a focus on the following language pairs: Czech-English ; French-English ; Romanian-English ; Spanish-English ; Spanish-Catalan ; and Arabic and Chinese -> English. To meet the project objectives we have assembled a team of machine translation researchers and developers from academic and industry:

The project has been organized into Work Packages, according to the interest and expertise of the project partners:

•WP1 – Project Coordination and Management

•WP2 – System Architecture and Integration

•WP3 – Web-Oriented Feedback Collection

•WP4 – Automatic Acquisition, Annotation, and Modeling of User Feedback

•WP5 – User-driven MT Systems

•WP6 – Evaluation and Dissemination

We now focus on descriptions of the technical work packages WP2-6, since these are likely to be of the most general interest.

WP2 – System Architecture and Integration -- Language Weaver SRL

In the first project year, effort in this work package was focused on developing the common infrastructure to support the research and development efforts throughout the project. The main objective for this period is the :

‣Development of an architecture that will permit the evaluation and exploitation of novel web-oriented, feedback collection mechanisms that reduce noise in feedback provided by users and increase the utility of the web contributions.

Progress in this work package is necessarily technical in nature, but key achievements concern the development of APIs for integration of SMT systems into the labs.reverso.net environment as well as APIs for collecting and sharing user feedback:

•The Technical Architecture was conceived and delivered to the partners by month 3. The document has been reviewed, updated based on the feedback received, and finally approved by all the partners.

•The Translation APIs among all sites and components have been created, documented, reviewed and approved by all consortium members.

•Access to the fully operational SMT infrastructure has been provided to the partners with the use of the commonly agreed API; the Language Weaver translation infrastructure is fully operational and integrated into the FAUST platform located at

•Feedback is presently collected and stored locally by Softissimo.

This figure gives a high-level summary of the development issues faced by WP2:

Infrastructure for user feedback collection and deployment of research MT systems

WP3 - Web-Oriented Feedback Collection Mechanisms -- Softissimo Inc

In the first project year, work has focused on the creation of labs.reverso.net and forum.reverso.net with interfaces and support for MT research systems. Considerable effort has been spent in an initial analysis of the Reverso web logs to gain a better understanding of the nature of translation requests and user feedback prior to feeding that data into the annotation and analysis activities in work package 4. As the first step in this effort, a mechanism for the collection of user feedback in response to automatic translations has been established on the Reverso website As an initial estimate of the amount of data collected, as of 11 August 2010 there have been 130.000 instances of user feedback logged. Separate interfaces are in place for each of the project languages, e.g. there is a French language interface for translations into French. As of September 2010, the statistics gathered via the various interfaces are: 100.000 items collected in French; 11.000 items collected in Spanish; 9.000 items collected in English; 5.000 items collected in Italian; and 400 items collected in German (although this is outside the project). We note that this is raw data tabulated from web logs. These numbers can be expected to vary considerably with further processing of the data.

The following is a preliminary summary based on observations of a sampling of the translation requests and associated feedback; please see for this report in its entirety. We emphasize that our project is focused on automatic methods for modeling and exploiting feedback, and so a detailed and exhaustive analysis of this data is not our main objective; however, some understanding of the data is necessary for building modeling techniques. An analysis of 100 translation suggestions in the French-to-English translation put the feedback into five different categories, as shown in this table.

Class / Number / Comments, examples
Improved / 47 / All errors are not necessarily corrected (MT: He(It) is still in the head of this organization, User: He is still at the head of (in charge of) this organization)
Includes selection of alternatives (between brackets) (see above, or when the automatic translation of «ils ne comprennent pas» contains “understand(include)” and the suggestions only keeps “understand”).
Includes corrections of errors due to errors in the source text (Src: fenetre (missing accented letter), MT: fenetre, User: window)
Perfect (subset of improved) / 13 / Example: Src: Dans les pays les plus démunis en Afrique ; MT : In countries the most deprived in Africa ; User : In Africa's most deprived countries
Degraded / 6 / MT: Oh! You already went in France several times!
User: Oh! You already went in French several times!
Unchanged / 11 / No changes, or white-space only
Mixed / 9 / Various examples where the user visibly did his/her best, but the suggestion contains new errors:
MT: ecoles of ingenieur
User: ecoles of ingénieur
MT: being capable of
User: being able of
MT: Saw you I speak too very English
User: You see I speek very well English
Malevolence, user mistakes / 16 / Suggestion contains insults, unintelligible or unrelated text, etc.

The following collection gives examples of instances of feedback provided by users in response to the automatic translation.

Source text / Automatic translation / Suggested translation / Comment
j'aimerais devenir hôtesse de l'air / I would like to become a stewardess / I would like to become an air hostess / Correct, but not very interesting.
. Le plus drôle, c’est lorsqu’il vient près de moi lorsque je suis tranquille, avec un jouet dans la gueule. À ce moment, je suis sûre que s’il pouvait parler il dirait « JE VEUX JOUER !» / . The funniest, it is when come I when I am quiet, with a toy in the mouth. At this moment, I am safe(sure) that if he could speak he would say " I WANT TO PLAY! " / . The funniest,it's when I'm quiet, he come with a toy in the mouth. At this moment, I am sure that if he could speak he would say " I WANT TO PLAY! " / Correct. «Je suis sûr que» -> “sure” instead of “safe(sure)” is interesting. Start of sentence hard to exploit.
charles de gaulle est né le 22 novembre 1890 / Charles de Gaulle was born on November 22nd, 1890 / Charles de Gaulle was born on November 22rd, 1890 / Bad.
Hier, j'étais à la mer mais il pleuvait alors je suis rentré chez moi / Yesterday, I was in the sea but it rained then I returned at home / Yesterday, I was at the beach but it rained then I came back home / Correct and interesting.
Demande de documentation. le 19 février 1990. Monsieur, Je cherche des étagères pour livres qui couvriraient tout un mur de mon salon. Pourriez-vous, s’il vous plaît, m’envoyer un catalogue qui présenterait tous les modèles que vous avez en stock ainsi que la liste des prix. En vous remerciant d’avance, je vous prie de croire, Monsieur, à l’expression de mes sentiments les meilleurs. / Demand(Request) of documentation. February 19th, 1990. Sir, I look for shelves for books(pounds) which would cover a whole wall of my lounge(show). You could, please, send me a catalog which would present all the models That you have in stock as well as the price-list. Sincerely Yours The best feelings. / Demand of documentation. February 19th, 1990. Sir, I look for booksshelves which would cover a whole wall of my living room. You could, please, send me a catalog which would present all the models That you have in stock as well as the price-list. Sincerely Yours The best feelings. / «étagères pour livres» -> “bookshelves” instead of “shelves for books(pounds)” and «salon» -> “living room” instead of “lounge(show)” are both interesting.
beaucoup de gens ne veulent pas être contactés n'importe quand. / A lot of people does not want to be contacted whenever. / A lot of people don't want to be contacted at any time. / Correct and very interesting.
le climat devient de plus en plus froid / The climate becomes more and more cold / the climate is becoming colder and colder / Correct and interesting.

WP4 - Acquisition, Annotation, and Modelling of Feedback -- Universitat Politecnica de Catalunya

This work package has the aim of analyzing feedback provided by users. The current objectives, in the first year of the project, are to :

•Develop methods for automatically annotating and analyzing translation re- quests and SMT system outputs

•Produce an open source suite of automatic metrics for SMT.

•Produce linguistic annotation tools to be used throughout the project. This includes the development of novel parsing techniques for analyzing noisy translation output

The following figure shows, highlighted in blue, the role of feedback annotation and modeling within the project. The aim is to abstract from the data we collect the information needed to characterize the task and to refine our MT engines.

The role of feedback modelling within the FAUST project

Some highlights from the first year of the project are:

•We have gathered linguistically-annotated corpora (i.e., treebanks) for the various languages involved (English, Spanish, Catalan, French, Czech, and Romanian). We are using these data to train our linguistic processors, and these processors have now been trained and are working well for regular text (i.e., well-formed sentences). One of the challenges in the FAUST project is to deal with possibly poorly-formed sentences, e.g., arising from user inputs or system outputs. For this reason we are adapting linguistic processors to the analysis of noisy text.

•We have compiled a static corpora collection to be used for training and testing our SMT Systems. We are using the publicly available data sets from the Fifth Workshop on Statistical Machine Translation (WMT10). We have prepared these corpora and performed its basic annotation (tokenization, part-of- speech tagging and lemmatization) for English, Spanish and Catalan. Annotation for other languages (French and Czech) and at deeper linguistic levels (i.e., syntactic dependencies and semantic roles) is currently ongoing.

•In the development metrics to assess difficulty of translation requests and quality of system outputs, work has begun with a review of the state of the art in confidence estimation, i.e., evaluation of the translation quality of system outputs when no translation references are available, and the automatic assessment of translation difficulty. We are currently implementing an initial set of measures. We have also designed an on-line learning architecture so as to combine different evaluation metrics into a single measure of quality such that their relative contribution is adjusted based on user feedback.

•We have extended the metric set in the IQMT FrameWork for Automatic MT Evaluation. Specifically, we have developed a series of document-level evaluation metrics based on discourse representations. We have also begun work on porting these linguistic measures to languages other than English, such as Spanish and Catalan.

WP5 - User-driven MT systems -- Charles University

This work package is focused on the development of MT systems which respond to user-feedback. Work in this first project year is focused on the following objectives:

•Create baseline MT systems for all language pairs in the project

•MT system training and adaptation to user feedback

•MT confidence for guiding user feedback

Baseline systems have been developed for all the language pairs of interest to the project. This is a a mix of commercial systems running at reverso.net and academic research systems which are being prepared for integration into labs.reverso.net . Commercial systems developed by Language Weaver are already running at Reverso.net. Notable points concerning the academic research systems are that:

•the UPC Spanish-English SMT system has been tested with the sites

• the CU TectoMT system, which translates texts from English to Czech, was adapted to run as a web service ( and can communicate with sites using our conventional REST API

• fast Hiero grammars have been developed for the Cambridge HiFST system to enable its use as a web service.

WP6 – Evaluation and Dissemination -- University of Cambridge

The objectives of this work package within the current project year are

•Common and consistent use of resources

•Consistent evaluation of all component technologies

•Internal evaluation of integrated systems, research and real-time demonstrators

•Demonstrate that the quality of the research within the project is at the state-of-the art by participation in international evaluations of translation technology

•Release Open Source modelling and analysis tools developed within the project

Given that the project is still in its early stages, our dissemination and evaluation activities have begun but have not yet reached the full levels of activity planned for years two and three. Some highlights of the year so far are :

•Specification of the size and selection criteria for the English-to-Czech/French/Romanian/Spanish development test sets. There was extensive discussion prior to data set selection by Language Weaver from web logs provided by Softissimo. Source segments in Czech, English, French, Romanian, and Spanish have been randomly selected from translation requests submitted to the LW Translation On-Demand Infrastructure during the months of Jan-Feb 2010.

•Distribution and support for translation software within the project. LW has provided translation support software through a modified version of Olifant.

•Development of translation guidelines and conventions for translation from English. Development of these guidelines was an iterative process guided by early feedback from the translators working with the project partners. Items which involved significant discussion were

(1)the need to allow for ‘partially sensible’ translations in which some but not all of the source text was fluent

(2) whether to allow translators to provide a corrected form of the source text