GSLT: Statistical Methods for NLP, spring term 2004
Practical Assignment 2: Statistical Machine Translation
Atelach Alemu Argaw
Kristina Nilson
1. Introduction
The purpose of this practical assignment is to construct a simple statistical machine translation system that translates individual sentences (in the form of blocks world commands) from Swedish to English, using an aligned translation corpus as training data.
2. Statistical Machine Translation
The idea of statistical machine translation is that by examining parallel data one should be able to learn how one language translates to another. Following e.g., (Brown, 1990), the translation process is viewed as a noisy channel problem, where it is assumed that a source signal is passed through a noisy channel which garbles it into a distorted target signal, and that the output of the channel depends probabilistically on the input (that is, the distortion is not random). Applied to machine translation, this means that we try to reason backwards to find the source sentence that is most likely to produce the target sentence, given the properties of the noisy channel (Manning & Schütze, 1999).
According to this method, the most probable translation T of a source sentence S can be found by maximizing the product of the string probability of T and the translation probability of S given T:
P(T | S) = P(T) P(S | T)
In order to try out this statistical translation method, we need:
- a language model P(T), to correct the output in the target language
- a translation model P(S|T), to ensure a correct translation
- an algorithm for finding the optimal translation of a sentence, given all possible translations. That is, we will try to find the target sentence that maximizes the product of the language model and the translation model for each source sentence (Knight, 1997).
3. Translation of blocks world commands
For this assignment, the following resources were used:
- an English blocks world corpus (used to compute the target language model)
- a Swedish-English aligned blocks world corpus (used to compute the translation model)
For decoding (that is, for finding the most probable translation for each sentence), Pharaoh, an open source beam search decoder for phrase-based translations (Koehn, 2004) was used.
3.1 Language model
The language model is used to ensure that the output is as correct as possible, as to word choice and word order in the target sentence. The most common way to construct language models is to use n-grams. Unigram, bigram, and trigram language models for the target language (English) were constructed from the training corpus. These language models were then represented in the ARPA ngram-format(5) that is required by the decoder employed. See appendix A below.
3.2 Translation model
The translation model has to explain how the source corresponds, or rather translates, to the target. The task of the model is to assign a probability score to a given source-target sentence pair.
For each target word, we need to know how likely it is that this word translates into a specific source word: P(S|T). The result is a translation model in the form of a translation probability table. We estimated the translation probabilities from the aligned corpus by implementing a simple code that calculates the co-occurrence frequency for each unique Swedish word and the corresponding English word or phrase at the specific position that is aligned with the Swedish counterpart. See appendix B below.
This translation model is used by the decoder in order to translate the source sentence to the target sentence. In this experiment unigram translation probabilities were used. Other n-gram probabilities could also be included depending on the quality of the translation output required, and the degree of computational expensiveness that could be allowed.
3.3 Search algorithm: Pharaoh
For decoding (i.e., finding the most probable translation), we used Pharaoh, a beam search decoder for statistical machine translation models (Koehn, 2004). Beam search moves downwards level by level, and at each level the best nodes are expanded, and the rest of the nodes are ignored.
4. Experiments and results
We used the sentences below for testing:
ta blocket på den gröna cirkeln
ställ blocket på cirklen på den röda cirklen
ställ den gröna konen på kvadraten
ta den röda konen
ställ det gröna blocket på kvadraten
ta den blåa konen på den röda cirkeln
ta kuben på den röda cirkeln
ställ den blåa konen på den röda cirkeln på den gröna cirkeln
ställ blocket på den gröna cirkeln
ställ det blåa blocket på cirkeln
The translation was first done with the default parameter settings, for example:
echo 'ta kuben på den röda cirkeln' | pharaoh -f pharaoh.ini
resulted in the translation ‘take cube the on red circle’
We experimented with different parameter settings to tune the model parameters. This was done in order to increase the quality of the translation. The probability cost that is assigned to a translation is a product of probability costs of four models: phrase translation table, language model, reordering model, and word penalty (Koehn, 2004).
As suggested in the Pharaoh user manual, we tried out with a large number of possible settings, and picked out what worked best. Setting the language model weight to 0.1 and the translation model weight to 1 produced very high quality translations. With this setting, the sentence in the previous example would result in a perfect translation.
echo 'ta kuben på den röda cirkeln' | pharaoh -f pharaoh.ini -tm 1 -lm 0.1
take the cube on the red circle
All the test sentences were translated with these parameter settings; see Appendix C below for the results.
5. Concluding remarks
It has been an interesting assignment that we learned very much out of! The training data used is a very simple blocks world language composed of 100 sentences, therefore it is not possible to generalize over the results. Although that is the case, for the specific experiment, the results were of reasonably high quality. The difference in the structure of the two languages is not fully captured in the translation model provided. The inclusion of bigram and trigram translation probabilities may further improve the results.
References
Brown, P., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D. Mercer, R.L., and Roosin, P.S. 1990. A Statistical Approach to Machine Translation. Computational Linguistics 16:2.
Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18:4.
Koehn, P. 2004. Pharaoh. A Beam Search Decoder for Phrase-Based Statistical Machine Translation. User Manual and Description for Version 1.2. 2004.
Manning C.D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
Appendix A
\data\
ngram 1=1-grams
ngram 2=2-grams
ngram 3=3-grams
\1-grams:
-0.904e_o_s
-1.236block
-1.461blue
-1.2circle
-1.446cone
-1.528cube
-1.678green
-0.899on
-1.096put
-1.145red
-1.2square
-1.364take
-0.605the
\2-grams:
-0.899 on the
-1.095 put the
-1.095 e_o_s put
-1.145 the red
-1.316 square e_o_s
-1.352 circle e_o_s
-1.364 take the
-1.364 e_o_s take
-1.431 block on
-1.461 the blue
-1.493 the square
-1.528 cone on
-1.566 the circle
-1.566 the block
-1.586 cube on
-1.629 red circle
-1.677 the green
-1.677 red square
-1.704 the cube
-1.704 block e_o_s
-1.732 circle on
-1.794 the cone
-1.829 square on
-1.829 red block
-2.063 red cone
-2.063 green block
-2.063 blue cone
-2.130 blue square
-2.130 blue block
-2.209 green circle
-2.209 cone e_o_s
-2.209 blue circle
-2.306 blue cube
-2.431 red cube
-2.431 cube e_o_s
-2.607 green square
-2.607 green cone
-2.908 green cube
-2.908 block e_o_s
\3-grams:
-1.097 e_o_s put the
-1.354 on the red
-1.366 e_o_s take the
-1.433 block on the
-1.495 on the square
-1.530 cone on the
-1.530 circle e_o_s put
-1.548 square e_o_s put
-1.568 on the circle
-1.588 cube on the
-1.631 the square e_o_s
-1.631 the red circle
-1.680 the red square
-1.680 the block on
-1.706 square e_o_s take
-1.734 circle on the
-1.764 the circle e_o_s
-1.764 red square e_o_s
-1.764 put the red
-1.764 put the block
-1.796 the cube on
-1.796 red circle e_o_s
-1.796 put the blue
-1.831 the red block
-1.831 square on the
-1.831 circle e_o_s take
-1.869 put the cube
-1.869 on the blue
-1.910 the cone on
-1.910 block e_o_s put
-1.956 put the cone
-2.007 the circle on
-2.007 take the red
-2.007 take the block
-2.065 the square on
-2.065 the red cone
-2.065 the green block
-2.065 the blue cone
-2.065 red block on
-2.065 on the green
-2.132 the blue square
-2.132 the blue block
-2.132 take the green
-2.132 red cone on
-2.132 red circle on
-2.132 blue cone on
-2.132 block e_o_s take
-2.211 the green circle
-2.211 the blue circle
-2.211 the block e_o_s
-2.211 take the cube
-2.211 red block e_o_s
-2.211 green block e_o_s
-2.211 green block e_o_s
-2.211 cone e_o_s put
-2.211 blue circle e_o_s
-2.308 the blue cube
-2.308 take the cone
-2.308 take the blue
-2.308 put the green
-2.308 green circle e_o_s
-2.308 blue square e_o_s
-2.308 blue cube on
-2.308 blue block on
-2.433 the red cube
-2.433 the cube e_o_s
-2.433 the cone e_o_s
-2.433 red square on
-2.433 red cube on
-2.609 the green square
-2.609 the green cone
-2.609 green square e_o_s
-2.609 green cone on
-2.609 green block on
-2.609 cube e_o_s put
-2.609 blue square on
-2.910 the green cube
-2.910 red cone e_o_s
-2.910 green cube on
-2.910 green circle on
-2.910 cube e_o_s take
-2.910 blue cone e_o_s
-2.910 blue block e_o_s
-2.910 blue block e_o_s
\end\
Appendix B
ställ ||| put ||| 1
ta ||| take ||| 1
det ||| the ||| 1
den ||| the ||| 1
gröna ||| green ||| 1
blåa ||| blue ||| 1
röda ||| red ||| 1
på ||| on ||| 1
kvadraten ||| the square ||| 0.51
kvadraten ||| square ||| 0.49
cirkeln ||| the circle ||| 0.431
cirkeln ||| circle ||| 0.569
konen ||| the cone ||| 0.448
konen ||| cone ||| 0.552
kuben ||| cube ||| 0.333
kuben ||| the cube ||| 0.667
blocket ||| the block ||| 0.468
blocket ||| block ||| 0.532
Appendix C
ta blocket på den gröna cirkeln
take block on the green circle
ställ blocket på cirkeln på den röda cirkeln
put block on circle on the red circle
ställ den gröna konen på kvadraten
put the green cone on square
ta den röda konen
take the red cone
ställ det gröna blocket på kvadraten
put the green block on square
ta den blåa konen på den röda cirkeln
take the blue cone on the red circle
ta kuben på den röda cirkeln
take cube on the red circle
ställ den blåa konen på den röda cirkeln på den gröna cirkeln
put the blue cone on the red circle on the green circle
ställ blocket på den gröna cirkeln
put block on the green circle
ställ det blåa blocket på cirkeln
put the blue block on circle
1