Noun Phrase Translation, a discriminative approach.

Grazia Russo-Lassner

Department of Linguistics

University of Maryland, College Park

Ling 895

29 November 2004

Abstract

A Noun Phrase translation system is potentially useful to a variety of multilingual Natural Language Processing applications. This study presents a generalized NP translation architecture that uses a discriminative reranking approach to reorder the list of candidate translations produced by a standard Machine Translation system. The discriminative model exploits linguistic and statistic knowledge about the phrase pairs used at training. Innovations with respect to similar approaches include the use of a web-based contextual similarity measure and a pseudo-supervised learning technique.

Experimental results show significant improvements in ranking NP translations, and support the value of the web-based contextual similarity feature.

Table of contents

Abstract……………………………………………………………………………………2

Table of contents ..………………………………………………………………………...3

List of tables……………………………………………………………………………….4

List of figures……………………………………………………………………………...5

1. Introduction …………………………………………………………………………….

2. An NP translation subsystem: motivations .…………………………….

2.1 Pseudo-phrase translations for Cross-Language IR

2.2 Answer pinpointing using NPs in Multilingual QA

2.3 Word order with NP translation system in StatisticalMT

3. NP translation architecture ………………………………………………

3.1 Related approaches ……..…………………………………….

3.2 Discriminative Reranking…………………………………..

3.3 Generating Candidate Phrases ……………………………..

3.4 Features ……………………………………………………

3.5 Acquiring Web Data ………………………………………

4. Experiments ………………………………………………………

4.1 Data and model training ……………………………………

4.2 Evaluation Metrics …………………………………………

4.3 Feature Selection ……………………………………………

4.4 Results ………………………………………………………

5. Other approaches to statistical NP translation ………………………

6. Future Work …………………………………………………………

7. Conclusions …………………………………………………………

Acknowledgements

List of tables

Table 1. Characteristics of the EuroParl corpus and size of NP parallel corpus extracted from it…………………………………………………………………………..

Table 2. NP pairs used for reranking…………………………………………………..

Table 3. Searching for phrases on the Internet Archive versus AltaVista …………..

Table 4. Dev RList results ………………………………………………..

Table 5. Examples showing the effects of the web-context similarity ………….

Table 6. Differences between models: on source and target counts …………………

Table 7. Differences between models: on demotion of candidates based on source and target counts ……………………………………………………………..

Table 8. Differences between models on dev set : on source and target token counts …………

Table 9. Differences between models on dev set: on source and target token counts again ………………………………………………………………………………

Table 10. The effect of the web context similarity features on dev set seen through ThirdRight…………………………………………………………………..

Table 11. Dev AList results ……………………………………………

Table 12. Test RList results …………………………………….

Table 13. Differences between models on test set: on translation model score ……………….

Table 14. Differences between models on test set: on phrase length issues …………………

Table 15. The effect of the web context similarity features on test set seen through ThirdRight…………………………………………………………………..

Table 16. Test AList results …………………………………………………………..

Table 17. Summary of other approaches to NP/NE translation ………………………..

List of figures

Figure 1. Phrasal pseudo translation of a Spanish document and Systram translation of the same document……………………………………………………………………

Figure 2. Example of reranked candidate list …………………………………………..

Figure 3. Classifier training …………………………………………………………….

Figure 4. Increase in Avg[N] for different sizes of n-best list …………………………

Figure 5. Histogram of candidate list sizes ……………………………………………

Figure 6. Features of the top-100 best models on development set …………………..

  1. Introduction

Noun Phrases (NP) are a pivotal constituent structure in sentences. Consequently, an NP translation system is potentially useful to a variety of multilingual Natural Language Processing applications. In Cross Language Information Retrieval (CLIR) it has been shown that noun phrases are effective indicators of document content; in multilingual Question Answering (MQA) noun phrases might be a more successful linguistic unit than words to work with to search for and identify answers that can only be expressed at the phrasal level; in Statistical Machine Translation (SMT) a standalone NP translation system might help to deal with the computational costs associated with word order issues and possibly help improve its output quality.

This study presents a generalized NP translation architecture, in which a statistical MT model is used to generate NP candidates in an initial ranking, and then a discriminative model is trained on a set of features to rerank the candidate list so that the correct translation appears first in the list. This architecture makes it easy to exploit NP-specific features that cannot be easily integrated into a translation model without increasing the complexity and the computational costs of its algorithm. With respect to previous approaches to NP translation, the innovations in this study include features that represent knowledge about both the source phrase and the target phrase, a web-based contextual similarity feature, and a pseudo-supervised learning approach to produce the negative evidence necessary to train the reranking model.

Experimental results demonstrate a statistically significant improvement over the baseline and the usefulness of contextual information in reranking the NP translation candidates, although the latter is clear on the development set, but not so clearly confirmed by the test set.

  1. Motivations for an NP translation subsystem

An NP translation subsystem is potentially useful in a variety of NLP applications. Statistical Machine Translation(SMT) systems, Cross Language Information Retrieval (CLIR), Multilingual Question Answering (MQA) could all benefit from incorporating such a subsystem. This section illustrates how an NP translation subsystem could be useful to such NLP applications. First, we present recent studies in CLIR showing that phrasal translations of a document can be as effective as a complete translation when performing document relevance assessment, and document search.Next, we will discuss how MQA could exploit an NP translation system to identify the answer in a document. Finally, we will discuss how an NP translation subsystem could help SMT reduce the computational costs of taking word order into account and the potential benefits for word choices when the correct head-to-head relationship between a verb and the head noun of its subject or object can be established by correctly translating the NPs in a sentence.

2.1 Pseudo-phrase translations for Cross-Language IR

In Cross Language IR, López-Ostenero et al. (2001) showed that the relevance of a document could be confidently assessed through its NPs, even presented independently of a sentential context. As an example, let’s consider the Systran translation of a Spanish document, and let’s compare it to its phrasal pseudo-translation displayed in Figure 1.[1]

PHRASAL PSEUDO TRANSLATION
-the Spanish Catholic Church
-a big manifestation
-next month
-its rejection
-the legislative measures
-the Socialist government
-all the dioceses
-the support of movements
-Catholics
-Fundamental topics
-Family
-Life
….
------
SYSTRAN translation:
The Church prepares a manifestation against the measures of the Madrid Government.
The Spanish catholic Church prepares one "gran manifestación" for the next month of December in Madrid, with the objective to express its rejection to the legislative measures that the socialist Government is developing. The call, made by all the diocese through its delegations of Education with the support of movements, Christian associations and groups, tries to mobilize to the catholic faithfuls around "los three fundamental subjects of Family, Life and Education, on which this Government is legislating of form so 'poco dialogante'", it on the matter says the call of the Madrilenian diocese of Alcala de Henares. The announcement of this manifestation that appears in the page Web of the diocese of Alcala, adds that it will maintain informed his faithfuls into the exact date of this manifestation and remembers to them that "nuestra is present at will absolutely be necesaria". Next to the call for this manifestation and pastoral letters of the bishops denouncing the legislative measures of the Government, the Spanish dioceses have initiated a collection of companies having requested that the religion subjet is distributed as it is predicted in the LOCE and answering the questionnaire raised by the Ministry of Education on the predicted reform. To this campaign of the SpanishChurch they have also been added in the last weeks, with his declarations, eminent prelates of the Vatican bar.

Figure 1. Phrasal pseudo translation of a Spanish document (top) and Systran translation of the same document (bottom).

For purposes of relevance assessment, the pseudo phrasal translation is as effective as a complete translation of the document and it can even achieve higher recall.

In subsequent experiments in interactive cross-language query formulation, López-Ostenero et al. (2002) showed that phrase-based search methods reach a 65% improvement in terms of precision and recall over word-based search methods. In their system, the phrases selected by the user are automatically translated and a monolingual search in the target language documents is performed. Eventually, the query is enriched with phrases previously extracted and translated in a pre-processing step[2] following the same techniques of López-Ostenero et al. (2001). An advantage of using phrases over words for cross-language document selection, besides the improved precision and recall scores, is the faster document selection on the part of humans, since phrases are generally less ambiguous than translation of single words. Moreover, phrases are preferable to full document translation for they minimize the computational cost associated with machine translation.

In addition to document relevance assessment and document search, a phrase-based approach to IR is useful to createshort summaries of the original documents that can be exploited for IR purposes. The most recent experiment by López-Ostenero et al. (2003) confirmed that indeed phrase-based summaries that correspond to about 30% of the original document are a viable approach to cross-lingual IR. In their study, they tried to determine how reliable these short summaries are when matched against a query that is itself formulated as a set of noun phrases. Two ways of doing document retrieval were compared in their experiment: “query translation” and “document translation”. “Query translation” searched for relevant documents with the translated phrase-based queries; “document translation” matched the phrase-based queries against the phrase-based summaries. The two methods resulted in the same precision and recall scores, thus confirming that phrase-based summaries can be used for cross-language indexing of a document collection. As pointed out by the authors themselves, to give more credibility to the results of their experiments, however, it would be necessary to adopt a more sophisticated NP translation approach.

2.2. Answer pinpointing using NPs in Multilingual QA

Another NLP application in which an NP translation subsystem can be useful is in Multilingual Question Answering. Question Answering combines Information Retrieval and Natural Language Processing in the same task. A QA system receives a natural language query in the form of a question as input, and processes large unstructured document collections to return a precise answer, rather than an entire document as in IR. In Multilingual Question Answering, the input question and the document collection are written in two different languages.

To answer automatically simple questions, a query is usually made from the user’s question, an IR type of search is performed to locate the document that is likely to contain the answer, and then the passage of the document most likely to correspond to the answer has to be identified (or pinpointed). To identify the answer in the relevant document, it is common practice to employ a window of words and score the content words in that window on the basis of whether they match or are variants of the words in the original question. Answer pinpointing that relies on window-based word scoring approaches is likely to miss information that can only be expressed at the phrasal level. For instance, for the questionHow big is Mars ?the answer could be expressed as a noun phraseMars’s equatorial radius is 3397.2 Km or as a clauseMars is 0.532 times the diameter of Earth; the question How did Iowa get its name ? is even more challenging.[3] We typed the above question on the Google search engine, letting the engine figure out the query, and examined a few documents returned. Among these, the one we paste below in which the possible answers have been underlined:

STORIES OF IOWA FOR BOYS AND GIRLS

CHAPTER X

ALBERT M. LEA AND THE NAMING OF IOWA

One morning in June, 1835, there was a great stir at old FortDes Moines, which, you remember, had been built near the site of Tesson's apple orchard. Three companies of mounted soldiers, called dragoons, were starting out on a long march northward into Minnesota. Their leader was Lieutenant Colonel Stephen Watts Kearny.

(…)

Lea's book is interesting because in it the name Iowa was first used for the territory now included in the State of Iowa. Where did Lieutenant Lea get the name Iowa? He tells us in the book that he took it from the name of the Iowa River.

We do not know for certain how the Iowa River got its name, but it may have been named from the Iowa tribe of Indians. Years afterwards Lieutenant Lea decided that the name should have been spelled Ioway; but by that time Congress had created first the Territory of Iowa and then the State of Iowa, so no change in the spelling was made.

(…)

In 1838, two years after the printing of the book which gave Iowa its name, Lieutenant Lea came back to IowaTerritory as one of the men to fix the boundary between Missouri and the Territory of Iowa. Later he spent several years as an engineer. When the Civil War began he joined the Confederacy and served for four years in the Confederate army. When the war ended he moved to Corsicana, Texas, where he died in 1890. Although Albert M. Lea named Iowa, his own name appears in Minnesota rather than in Iowa. The city of Albert Lea, Minnesota, was named for him, and there is also a LakeAlbert Lea in Minnesota.

(from visited on Oct24, 2004)

The answers go from a short clause such as Albert Lea named Iowato a more elaborate answer that might include information on the Iowa tribe of Indians and the Iowa River.If, as Lopez-Ostenero et al. have shown, the relevance of a document could be effectively assessed by translating only its NPs, then MQA also should be able to exploit pseudo translation of NPs to retrieve the relevant documents and possibly to identify the answer into the most relevant document. For instance, a Spanish user trying to get an answer to the following question Por que fue [la propuesta de ley de salud de Hilary Clinton] rechazada? searching against an English document collection, is asking something about the whole bracketed NP, not about the smaller noun phrases contained in it. The NP la propuesta de ley de salud de Hilary Clinton would have to be correctly translated into Hilary Clinton Health Care Bill Proposal for the IR component of the QA system to be able to return relevant documents. Translating the NP correctly is essential to being able to identify the head of the NP, which in turn is fundamental to classify the NP as person versus place or organization; identifying the category to which an NP belongs is crucial to a QA system to recognize the type of answer it is supposed to return (TIME, PERSON, LOCATION, MEASURE, OTHER, etc.). [4]

2.3. Word order with NP translation system in StatisticalMT

In the framework of SMT, word order issues pose a snag. Divergences in word order between languages can be found both at the sentence level and at the phrasal level. At the sentence level, a fixed word order language, like English, places the main verb between the subject and the object, while in a free word order language, like Japanese and Turkish, it is the object that is placed between subject and verb. Variations can also be found at the phrasal level even between languages that belong to the same typology: English is head-initial for the VP and head-final for the NP. Spanish NPs are usually head-initial, with the adjective placed after the noun, but many times head-final, depending on the purpose of the adjective itself.[5]

In SMT the problem of word order must be dealt with both at training and at decoding, and represents a challenge in terms of computational costs as well as modeling choices. At training, a translation model has to learn the correspondences (also called alignments) between word positions in a source sentence and positions of the correspondent words in the target sentence. Long sentences represent a serious difficulty due to the very large number of possible word-level alignments; in fact, there are (I + 1)J possible word level alignments between a source sentence of length I and a target sentence of length J. At decoding, an extremely large number of possible hypotheses on the reordering of the target words must be taken into account. As the space of possible translations is extremely large, typically decoders have to balance speed and quality: if search is limited to only a portion of the space of possible translation, good translations risk to be missed, while exhaustive searches lead to unacceptably slow decoding. [6]

The computational costs of taking into account word order can be illustrated with a few examples. The decoder’s output of Spanish noun phrases[7] like ciertas dificultades prácticas, profunda injerencia centralista, amplio respaldo internacional, are certain difficulties practical, serious interference centralist, broad support international. The translation candidates are “almost” the desired output, except that the word order is not exactly what would be expected in English. These are cases of search errors in which either the language model or the translation model is responsible for steering the decoder in the wrong direction. It is possible that in the cases above, the language model, responsible for scoring precisely the word order in the output translation, might not discriminate between what is a good translation and what is not. But there is another explanation: the decoder might be avoiding distortion. Without going too much into the details, the decoder used in this study starts from an initial gloss of the input string which consists of aligning each word in the foreign sentence to its most likely translation in English; then, it iteratively examines all other alignment (and translation) hypothesis that are one operation away from the alignment under consideration.[8] At every step, the decoder chooses the alignment of highest probability until no improvement can be made with respect to the probability of the current alignment. It is possible that the distortion parameters coming from the translation model are not good enough to steer the decoder towards a translation candidate of different word order from the input string. In such a situation, even in the presence of a good language model, the translation model that is avoiding distortion might outweigh the language model and trigger these search errors.[9]