GSLT: Statistical Methods for NLP, spring term 2004

Practical Assignment 2: Statistical Machine Translation

Atelach Alemu Argaw

Kristina Nilson

1. Introduction

The purpose of this practical assignment is to construct a simple statistical machine translation system that translates individual sentences (in the form of blocks world commands) from Swedish to English, using an aligned translation corpus as training data.

2. Statistical Machine Translation

The idea of statistical machine translation is that by examining parallel data one should be able to learn how one language translates to another. Following e.g., (Brown, 1990), the translation process is viewed as a noisy channel problem, where it is assumed that a source signal is passed through a noisy channel which garbles it into a distorted target signal, and that the output of the channel depends probabilistically on the input (that is, the distortion is not random). Applied to machine translation, this means that we try to reason backwards to find the source sentence that is most likely to produce the target sentence, given the properties of the noisy channel (Manning & Schütze, 1999).

According to this method, the most probable translation T of a source sentence S can be found by maximizing the product of the string probability of T and the translation probability of S given T:

P(T | S) = P(T) P(S | T)

In order to try out this statistical translation method, we need:

  1. a language model P(T), to correct the output in the target language
  2. a translation model P(S|T), to ensure a correct translation
  3. an algorithm for finding the optimal translation of a sentence, given all possible translations. That is, we will try to find the target sentence that maximizes the product of the language model and the translation model for each source sentence (Knight, 1997).

3. Translation of blocks world commands

For this assignment, the following resources were used:

  • an English blocks world corpus (used to compute the target language model)
  • a Swedish-English aligned blocks world corpus (used to compute the translation model)

For decoding (that is, for finding the most probable translation for each sentence), Pharaoh, an open source beam search decoder for phrase-based translations (Koehn, 2004) was used.

3.1 Language model

The language model is used to ensure that the output is as correct as possible, as to word choice and word order in the target sentence. The most common way to construct language models is to use n-grams. Unigram, bigram, and trigram language models for the target language (English) were constructed from the training corpus. These language models were then represented in the ARPA ngram-format(5) that is required by the decoder employed. See appendix A below.

3.2 Translation model

The translation model has to explain how the source corresponds, or rather translates, to the target. The task of the model is to assign a probability score to a given source-target sentence pair.

For each target word, we need to know how likely it is that this word translates into a specific source word: P(S|T). The result is a translation model in the form of a translation probability table. We estimated the translation probabilities from the aligned corpus by implementing a simple code that calculates the co-occurrence frequency for each unique Swedish word and the corresponding English word or phrase at the specific position that is aligned with the Swedish counterpart. See appendix B below.

This translation model is used by the decoder in order to translate the source sentence to the target sentence. In this experiment unigram translation probabilities were used. Other n-gram probabilities could also be included depending on the quality of the translation output required, and the degree of computational expensiveness that could be allowed.

3.3 Search algorithm: Pharaoh

For decoding (i.e., finding the most probable translation), we used Pharaoh, a beam search decoder for statistical machine translation models (Koehn, 2004). Beam search moves downwards level by level, and at each level the best nodes are expanded, and the rest of the nodes are ignored.

4. Experiments and results

We used the sentences below for testing:

ta blocket på den gröna cirkeln

ställ blocket på cirklen på den röda cirklen

ställ den gröna konen på kvadraten

ta den röda konen

ställ det gröna blocket på kvadraten

ta den blåa konen på den röda cirkeln

ta kuben på den röda cirkeln

ställ den blåa konen på den röda cirkeln på den gröna cirkeln

ställ blocket på den gröna cirkeln

ställ det blåa blocket på cirkeln

The translation was first done with the default parameter settings, for example:

echo 'ta kuben på den röda cirkeln' | pharaoh -f pharaoh.ini

resulted in the translation ‘take cube the on red circle’

We experimented with different parameter settings to tune the model parameters. This was done in order to increase the quality of the translation. The probability cost that is assigned to a translation is a product of probability costs of four models: phrase translation table, language model, reordering model, and word penalty (Koehn, 2004).

As suggested in the Pharaoh user manual, we tried out with a large number of possible settings, and picked out what worked best. Setting the language model weight to 0.1 and the translation model weight to 1 produced very high quality translations. With this setting, the sentence in the previous example would result in a perfect translation.

echo 'ta kuben på den röda cirkeln' | pharaoh -f pharaoh.ini -tm 1 -lm 0.1

take the cube on the red circle

All the test sentences were translated with these parameter settings; see Appendix C below for the results.

5. Concluding remarks

It has been an interesting assignment that we learned very much out of! The training data used is a very simple blocks world language composed of 100 sentences, therefore it is not possible to generalize over the results. Although that is the case, for the specific experiment, the results were of reasonably high quality. The difference in the structure of the two languages is not fully captured in the translation model provided. The inclusion of bigram and trigram translation probabilities may further improve the results.

References

Brown, P., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D. Mercer, R.L., and Roosin, P.S. 1990. A Statistical Approach to Machine Translation. Computational Linguistics 16:2.

Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18:4.

Koehn, P. 2004. Pharaoh. A Beam Search Decoder for Phrase-Based Statistical Machine Translation. User Manual and Description for Version 1.2. 2004.

Manning C.D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

Appendix A

\data\

ngram 1=1-grams

ngram 2=2-grams

ngram 3=3-grams

\1-grams:

-0.904e_o_s

-1.236block

-1.461blue

-1.2circle

-1.446cone

-1.528cube

-1.678green

-0.899on

-1.096put

-1.145red

-1.2square

-1.364take

-0.605the

\2-grams:

-0.899 on the

-1.095 put the

-1.095 e_o_s put

-1.145 the red

-1.316 square e_o_s

-1.352 circle e_o_s

-1.364 take the

-1.364 e_o_s take

-1.431 block on

-1.461 the blue

-1.493 the square

-1.528 cone on

-1.566 the circle

-1.566 the block

-1.586 cube on

-1.629 red circle

-1.677 the green

-1.677 red square

-1.704 the cube

-1.704 block e_o_s

-1.732 circle on

-1.794 the cone

-1.829 square on

-1.829 red block

-2.063 red cone

-2.063 green block

-2.063 blue cone

-2.130 blue square

-2.130 blue block

-2.209 green circle

-2.209 cone e_o_s

-2.209 blue circle

-2.306 blue cube

-2.431 red cube

-2.431 cube e_o_s

-2.607 green square

-2.607 green cone

-2.908 green cube

-2.908 block e_o_s

\3-grams:

-1.097 e_o_s put the

-1.354 on the red

-1.366 e_o_s take the

-1.433 block on the

-1.495 on the square

-1.530 cone on the

-1.530 circle e_o_s put

-1.548 square e_o_s put

-1.568 on the circle

-1.588 cube on the

-1.631 the square e_o_s

-1.631 the red circle

-1.680 the red square

-1.680 the block on

-1.706 square e_o_s take

-1.734 circle on the

-1.764 the circle e_o_s

-1.764 red square e_o_s

-1.764 put the red

-1.764 put the block

-1.796 the cube on

-1.796 red circle e_o_s

-1.796 put the blue

-1.831 the red block

-1.831 square on the

-1.831 circle e_o_s take

-1.869 put the cube

-1.869 on the blue

-1.910 the cone on

-1.910 block e_o_s put

-1.956 put the cone

-2.007 the circle on

-2.007 take the red

-2.007 take the block

-2.065 the square on

-2.065 the red cone

-2.065 the green block

-2.065 the blue cone

-2.065 red block on

-2.065 on the green

-2.132 the blue square

-2.132 the blue block

-2.132 take the green

-2.132 red cone on

-2.132 red circle on

-2.132 blue cone on

-2.132 block e_o_s take

-2.211 the green circle

-2.211 the blue circle

-2.211 the block e_o_s

-2.211 take the cube

-2.211 red block e_o_s

-2.211 green block e_o_s

-2.211 green block e_o_s

-2.211 cone e_o_s put

-2.211 blue circle e_o_s

-2.308 the blue cube

-2.308 take the cone

-2.308 take the blue

-2.308 put the green

-2.308 green circle e_o_s

-2.308 blue square e_o_s

-2.308 blue cube on

-2.308 blue block on

-2.433 the red cube

-2.433 the cube e_o_s

-2.433 the cone e_o_s

-2.433 red square on

-2.433 red cube on

-2.609 the green square

-2.609 the green cone

-2.609 green square e_o_s

-2.609 green cone on

-2.609 green block on

-2.609 cube e_o_s put

-2.609 blue square on

-2.910 the green cube

-2.910 red cone e_o_s

-2.910 green cube on

-2.910 green circle on

-2.910 cube e_o_s take

-2.910 blue cone e_o_s

-2.910 blue block e_o_s

-2.910 blue block e_o_s

\end\

Appendix B

ställ ||| put ||| 1

ta ||| take ||| 1

det ||| the ||| 1

den ||| the ||| 1

gröna ||| green ||| 1

blåa ||| blue ||| 1

röda ||| red ||| 1

på ||| on ||| 1

kvadraten ||| the square ||| 0.51

kvadraten ||| square ||| 0.49

cirkeln ||| the circle ||| 0.431

cirkeln ||| circle ||| 0.569

konen ||| the cone ||| 0.448

konen ||| cone ||| 0.552

kuben ||| cube ||| 0.333

kuben ||| the cube ||| 0.667

blocket ||| the block ||| 0.468

blocket ||| block ||| 0.532

Appendix C

ta blocket på den gröna cirkeln

take block on the green circle

ställ blocket på cirkeln på den röda cirkeln

put block on circle on the red circle

ställ den gröna konen på kvadraten

put the green cone on square

ta den röda konen

take the red cone

ställ det gröna blocket på kvadraten

put the green block on square

ta den blåa konen på den röda cirkeln

take the blue cone on the red circle

ta kuben på den röda cirkeln

take cube on the red circle

ställ den blåa konen på den röda cirkeln på den gröna cirkeln

put the blue cone on the red circle on the green circle

ställ blocket på den gröna cirkeln

put block on the green circle

ställ det blåa blocket på cirkeln

put the blue block on circle

1