Relevant bibliography for the MSMT course


P. Koehn (2010): Statistical Machine Translation. Publisher: Cambridge University Press. ISBN-10: 0521874157


Session1a (day 1)

W. John Hutchins (2007): Machine translation: a concise history, Computer aided translation: theory and practice

Session1b (days 1 and 2)


K. Knight (1999): A Statistical MT Tutorial Workbook,

P. Koehn and F. J. Och and D. Marcu (2003): Statistical Phrase Based Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL),

I. J. Good (1953): The population frequency of species and the estimation of population parameters, Biometrika


F. J. Och and N. Ueffing and H. Ney (2001): An Efficient A* Search Algorithm for Statistical Machine Translation, Workshop on Data-Driven Machine Translation at 39th Annual Meeting of the Association of Computational Linguistics (ACL),

K. Knight (1999): Decoding Complexity in Word-Replacement Translation Models, Computational Linguistics, Squibs & Discussion, 25(4).

K. Papineni, S. Roukos, T. Ward and W.-J. Zhu (2002): BLEU: a Method for Automatic Evaluation of Machine Translation. In ACL-2002: 40th Annual meeting of the Association for Cimputational Linguistics pages 311-318

A. Lavie and A. Agarwal (2007). METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second ACL Workshop on Statistical Machine Translation, pages 228–231, Prague, Czech Republic.

J. Roturier (2009): Deploying novel MT technology to raise the bar for quality: a review of key advantages and challenges. In Proceedings of the MT Summit XII, pages 1-8, Ottawa, Canada.

C. Callison-Burch, M. Osborne and P. Koehn (2006): Re-evaluating the role of BLEU on machine translation research. In Proceedings of EACL-2006, pages 249-256, Trento, Italy.

Session1c (day 2)

Koehn, P. and Schroeder, J. (2007): Experiments in Domain Adaptation for Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation,

Lü, Y., Huang, J. and Liu, Q. (2007): Improving Statistical Machine Translation Performance by Training Data Selection and Optimization, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL),

Wu, H. and Wang, H. and Zong, C. (2008): Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

P. Banerjee, S. K. Naskar, J. Roturier, A. Way, and J. Van Genabith,(2011) Domain adaptation in statistical machine translation of user-forum data using component level mixture modelling, in Proceedings of the 13th Machine Translation Summit, pp. 285–292.

Rogati, M. (2009) Domain Adaptation of Translation Models for Multilingual Applications, PhD proposal

C. A. Henriquez Q., J. B. Mariño and R. E. Banchs (2011) Deriving translations units using small additional corpora. Proceedings of the 15th Annual Conference of the European Association for Machine Tranlation, 121--128, Leuven, Belgium.

Session1d (day 3)

D. Chiang (2005): A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL) 2005, pages 263–270.

J. B. Mariño, R. E. Banchs, J. M. Crego, A. de Gispert, P. Lambert, J. A. R. Fonollosa, and M. R. Costa-jussa (2006): N-gram based machine translation. Computational Linguistics, 32(4):527–549, 2006

A. Zollmann and A. Venugopal (2006).:Syntax augmented machine translation via chart parsing. In Proceedings of the North American Association for Computational Linguistics Conference (NAACL)

A. Zollmann, A. Venugopal, F. J. Och, and J. Ponte (2008): A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT. In Proceedings of Coling 2008, pages 1145–1152, Manchester.

Session2a (day 4)

Franz Josef Och, Hermann Ney (2003) A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, vol. 29 (2003), pp. 19-51

Koehn, P. and Hoang, H. and Birch, A. and Callison-Burch, C. and Federico, M. and Bertoldi, N. and Cowan, B. and Shen, W. and Moran, C. and Zens, R. and Dyer, C. J. and Bojar, O. and Constantin, A. and Herbst, E. (2007): Moses: Open Source Toolkit for Statistical Machine Translation, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

Stolcke, A.,(2002) SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado.

M. Federico,N. Bertoldi, M. Cettolo (2008): IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models, Interspeech 2008, pp.1618-1621,(Interspeech 2008, Brisbane, Australia.

K. Heafield (2011): KenLM: Faster and Smaller Language Model Queries. Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation. Edinburgh, UK.

Session2b (day 5)

Source Context features

C. España-Bonet, J. Giménez, Ll. Màrquez (2009) Discriminative Phrase-Based Models for Arabic Machine Translation ACM Transactions on Asian Language Information Processing Journal (TALIP), vol. 8, No. 4, pag. 1-20.

Costa-jussà M. R. and Banchs, R. E. (2011) A vector-space dynamic feature for phrase-based statistical machine translation Journal of Intelligent Information Systems Volume 37, Issue 2, pages 139-154


Dyer, C. J. and Muresan, S. and Resnik, P. (2008): Generalizing Word Lattice Translation, Proceedings of ACL-08: HLT,

Carpuat, M. (2009) Toward Using Morphology in French-English Phrase-Based SMT. EACL 2009 Fourth Workshop on Statistical Machine Translation. Athens, Greece: March.

Sarikaya, R. and Deng, Y. (2007): Joint Morphological-Lexical Language Modeling for Machine Translation, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers,

Minkov, E. and Toutanova, K. and Suzuki, H.(2007): Generating Complex Morphology for Machine Translation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics


Costa-jussà, Marta R. and Fonollosa, José.A.R. (2009) An Ngram Reordering Model Computer Speech and Language Volume 23, Issue 3, Pages 362-375

Khalilov, M. and Sima'an K. (to appear). Statistical Translation After Source Reordering. Cambridge Journal of Natural Language Engineering, pages -.

R. Tromble, J. Eisner (2009): Learning linear ordering problems for better translation. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Morristown, NJ, USA, pp. 1007-1016.

K. Imamura, H. Okuma, and E. Sumita (2005): Practical approach to syntax-based statistical machine translation. In Proceedings of the 10th Machine Translation Summit (MT Summit X), pages 267–274.

Hybrid Machine Translation

Forcada, M.L.,Tyers, F.M. & Ramírez, G. The Apertium Machine Translation Platform: five years on. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alacant, Spain 2009.

Carbonell, J. Klein, St., Miller, D. Steinbaum, M., Grassiany, T. and Frey, J. (2006) Context-based Machine Translation, Proc, 7th ACL in the Americas, Cambridge, Ms

G. Labaka, A. Díaz de Ilarraza, C. España-Bonet, Ll. Màrquez, K. Sarasola (2011) Deep evaluation of hybrid architectures: simple metrics correlated with human judgments Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT), Barcelona.

Thumair G. (2009) Comparing different architectures of hybrid machine translation Proc. Of Mt-Summit XII, Canada


Popović, M., & Ney, H. (2009) Syntax-oriented evaluation measures for machine translation output. In Proceedings of the Fourth Workshop on Statistical Machine Translation (pp. 29-32). Athens: ACL., 2009

Giménez, J., & Màrquez, L. .(2007) Linguistic features for automatic evaluation of heterogeneous MT systems. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 256-264). Prague: ACL.

J. Giménez and Ll. Márquez. (2010) Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation The Prague Bulletin of Mathematical Linguistics, No. 94.

Farrús, M., Costa-jussà M. R. and Popovic, M.(2012) Study and correlation analysis of linguistic, perceptual and automatic machine translation evaluations. JASIST Journal of the American Society for Information Sciences and Technology Volume 63, Issue 1, pages 174-184

S. O’Brien (2011): Towards predicting post-editing productivity. Machine Translation 25(3), 197-215.

Session 3 (day 5)

M. Plitt, F. Masselot (2010): A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context. The Prague Bulletin of Mathematical Linguistics, (93), 7-16.

M. Khalilov and R. Choudhury (2012): Building Engish-Chinese and Chinese-English MT engines for computer software domain. In Proceedings of EAMT 2012. pages 7-11.

A. Ruopp. 2010. The Moses for Localization Open Source Project. In Proceedings of AMTA 2010, Denver, Colorado.

A. Ruopp and F. Xia (2008):Finding parallel texts on the web using cross-language information retrieval. In Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies, Hyderabad, India.

M. Volk, R. Sennrich, C. Hardmeier, F. Tidstrom (2010): Machine Translation of TV Subtitles for Large Scale Production. “Machine Translation of TV Subtitles for Large Scale Production.” Proceedings of the Second Joint EM+/CNGL Workshop on Bringing MT to the User. Research on Integrating MT in the Translation Industry.Denver, 4 November 2010, 53-62.

T. Hoar (2009): Technical guide to SMT training data. TAUS Report.