Detailed PhD project

Provisional title:

EN: Contrastive linguistic analysis of English and French IT texts in the field of cyberdefense

FR:Analyse linguistique contrastive de l’anglais et du français de l’informatique appliquée à la cyberdéfense

Research Question:

Can a comparative corpus-based analysis of the English and French specialised language used in specialised press texts dealing with cyberdefense be used to improve the quality of professional translations and/or translator training?

Primary Scientific Field: Linguistics

Secondary Field:Natural Language Processing (NLP)

Project Summary:

One of the major developments in linguistic research has come from the possibility of studying vast amounts of text through computer enhanced tools, particularly through text retrieval and concordancingprogrammes such as AntConc.The basic investigation procedure for querying text corpora consists in producing multiple concordance lines, the so-called Key Word In Context — or KWIC — concordance, for a specified string of characters — a word, a lemma or a phrase. The citations thus obtained can be sorted to reveal recurring clusters of words. The analysis of these recurring patterns highlights the behaviour of actual language in context, and complements and sometimes challenges the information provided by standard reference tools such as dictionaries and grammars.

The research project involves acquiring or collecting a comparative corpus of texts in English and French in a specialised sub-domain of the field of computer science, analyzing and annotating the texts with specific linguistic analyses, and to determine whether the findings may or may not be applied to improve the understanding and production skills of translators and communicators, improve QA processes for future professional translations in that field and/or to improve the training of professional translators.

This PhD project would be innovative in terms of working with a specific sub-domain of computer science - cyberdefense- that, to the best of our knowledge, no one else has addressed in this manner; and using a methodology that has not been used for this specific language combination and sub-domain. The project thus takes an original approach to a seldom exploredspecialised language.

Mots-clés: corpus comparables, langue de spécialité, linguistique de corpus, linguistique contrastive, cyberdéfense

Keywords: comparable corpora, specialised language, corpus linguistics, contrastive linguistics, cyberdefense

Suggested Methodology:

The researcher will use corpus analytic techniques to investigate whether a comparative corpus-based analysis of the English and French specialisedlanguage used in specialisedpress texts dealing with cyberdefense can be used to improve the quality of professional translations and/or translator training. This research will entail the collection and analysis of comparable corpora (collected from Misc for the French language and Digital Forensic Magazine for the English language for instance).

Academic Impact:

The development and use of a new and innovative methodology based on cross-disciplinary approaches to linguistics and cyberdefense will enhance the academic knowledge economy and provide a solid basis for delivering and training highly skilled researchers. It will contribute to laying the foundations for longer-term collaborations with other disciplines and centres, both within and outside the Frenchacademic community. As such, the PhD project will enhance the visibility of HTCI and UBO in the increasingly important fields of computational linguistics and applied translation studies research, both nationally and internationally.

The research has the potential to lead to at least two national or international publications by 2018. It is also expected that a series of events (workshops, seminars, colloquia) will be organised with the active involvement of the researcher to build a network with other scholars and centres working in the fields of linguistics and cyberdefense.

Societal Impact:

The project is also expected to have a significant impact beyond the academic community,cyberdefensebeing one of the priorities recently defined in the French government's industrial strategy and a crucial element of today’s European society.

At the industrial level, the research in this emergent field will directly and indirectly benefit the language industry as well as agencies and companies involved in cyberdefense, at both French and EU level.

Big data anlaysis and language issues in relation with cyberdefense and/or cybersecurity are becoming increasingly central in many sectors: the air transport industry (fares and safety data), healthcare (clinical outcomes data) and recorded music sector (track based metadata, or data about data) are only a few examples.

Eligibility Criteria:

The successful applicant will ideally come from a translation studies background, though candidates from other disciplines with translation expertise and experience may be considered. All candidates must satisfy the University’s minimum doctoral entry criteria.Applicants should also have research interests in the following areas: Linguistics and Translation Studies.

Additionally, candidates will be expected to have a solid IT culture, with expert knowledge in at least one of the following areas: intellectual property, data protection or cybersecurity. We encourage submission of original and thought-provoking research proposals addressing comparable corpora in the field of Cyberdefense. The corpus could be collected from Misc and Digital Forensic Magazine for instance.

Bibliography

Comparable Corpora:

- Bouaud J., Habert B., Nazarenko A. & Zweigenbaum P. (2000). Regroupements issus de dépendances syntaxiques sur un corpus de spécialité: catégorisation et confrontation à deux conceptualisations du domaine. In J. Charlet, M. Zacklad, G. Kassel & D. Bourigault, Rédacteurs, Ingénierie des connaissances: évolutions récentes et nouveaux défis, chapitre 17, p. 275?290. Paris: Eyrolles.

- Bourigault D. (2002). Upery: un outil d'analyse distributionnelle étendue pour la construction d'ontologies à partir de corpus. In J.-M. Pierrel, Rédacteur, Actes de TALN 2002 (Traitement automatique des langues naturelles), p. 75?84, Nancy: ATALA ATILF.

- Chiao Y.-C. (2004). Extraction lexicale bilingue à partir de textes médicaux comparables: application à la recherche d'information translangue. Thèse de doctoratd'informatique médicale, Université Paris 6.

- Chiao Y.-C., Sta J.-D. & Zweigenbaum P. (2004). A novel approach to improve word translations extraction from non-parallel, comparable corpora. In Actes International Joint Conference on Natural Language Processing, Hainan, China: AFNLP.

- Chiao Y.-C. & Zweigenbaum P. (2002). Looking for candidate translational equivalents in specialized, comparable corpora. In Proceedings of the 19th COLING, p. 1208?1212, Taipei, Taiwan.

- Déjean H., Gaussier E., Renders J.-M. & Sadat F. (2005). Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval. Arti?cial Intelligence in Medicine, 33(2), 111?124.

- Déjean H. & Éric Gaussier (2002). Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables. Lexicometrica. Numéro spécial Alignement lexical dans les corpus multilingues, resp. Jean Véronis.

- Fung P. & McKeown K. (1997). Finding terminology translations from parallel corpora. In Actes Fifth Annual Workshop on Very Large Corpora, p. 192?202: ACL.

- Fung P. & Yee L. Y. (1998). An IR approach for translating new words from non-parallel, comparable texts.In Proceedings of the 36 th ACL, p. 414?420, Montréal.

- Habert B. & Zweigenbaum P. (2003).Classer les mots: sémantique à gros grain et méthodologie harrissienne. Revue de sémantique et de pragmatique, (12), 25?45.

- Rapp R. (1995). Identifying word translation in non-parallel texts. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, student session, volume 1, p. 321?322, Boston, Mass.

- Rapp R. (1999). Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37 th ACL, College Park, Maryland. Sadat F., Yoshikawa M. & Uemura S. (2003).

- Learning bilingual translations from comparable corpora to cross-language information retrieval: Hybrid statistics-based and linguistics-based approach. In J. Adachi & K.-F. Wong, Rédacteurs, Actes Sixth International Workshop on Information Retrieval with Asian Languages, p. 57?64.

- Zweigenbaum P. & Habert B. (2004). Accès mesurés aux sens. Mots, (74), 93?106. Zweigenbaum P. & Habert B. (2006). Faire se rencontrer les parallèles: Regards croisés sur l'acquisition lexicale monolingue et multilingue. Glottopol, (8), 22?44.

Corpora and Translation Teaching:

- Arrouart Catherine, 2003, «Les mémoires de traduction et la formation universitaire: quelques pistes de réflexion», dans Meta, vol. 48, n°3, Les Presses de l’Université de Montréal.

- BÉdard Claude, 2000, «Mémoire de traduction cherche traducteur de phrases…», dansTraduire, n°186.

- Bernardini Silvia, 2006, «Corpora for translators and translation practices. Achievements and challenges», dansProceedings of LREC (Language Resources and Evaluation Conference), p.17-22.

- Bowker Lynne, 1998, «Using Specialized Monolingual Native-Language Corpora as a Translation Resource: a Pilot Study», dansMeta, vol.43, n°4, p.631-651.

- Bowker Lynne, 2002, Computer-Aided Translation Technology: A Practical Introduction, University of Ottawa, Ottawa Press, Didactic of Translation Series.

- Bowker Lynne, BarlowMickael, 2008, «Acomparative evaluation of bilingual concordancers and translation memory systems», dansTopics for Language Resources in Translation and Localisation, Yuste Rodrigo, Elia (ed.), p.1-22.

- FRÉROT Cécile, JOSSELIN-LERAY Amélie, 2008, «Contribution des corpus à l’enrichissement des dictionnaires bilingues généraux. Application au domaine de la volcanologie», dans Autour des langues et du langage. Perspective interdisciplinaire,Presses Universitaires de Grenoble, p.415-422.

- Gouadec Daniel, 2006, «Profils de compétences et formation universitaire», Colloque de l’université Rennes 2: Quelle qualification universitaire pour les traducteurs?

- GowFrancie, 2003, Metrics for Evaluating Translation Memory Software, Thesis under the supervision of L. Bowker, School of Translation and Interpretation, University of Ottawa.

- Granger Sylviane,2003, «The corpus approach: a common way forward for Contrastive Linguistics and Translation Studies», dansGrangerS., LerotJ. et Petch-TysonS. (ed.), Corpus-based Approaches to Contrastive Linguistics and Translation Studies, Amsterdam & Atlanta, Rodopi, p.17-29.

- Josselin-leray Amélie, 2005, Place et rôle des terminologies dans les dictionnaires généraux unilingues et bilingues. Etude d’un domaine de spécialité: volcanologie, thèse de doctorat, Université LyonII.

- Laviosa Sara, 2002, Corpus-Based Translation Studies: Theory, Findings, Applications, Amsterdam, New York, Rodopi.

- MeLLANGE, 2006, Corpora & E-learning Survey Results, en ligne le 20 novembre 2008.

- Pearson Jennifer, 2003, «Using Parallel Texts in Translation Training Environment», dansZanettin Frederico, Bernardini Silvia etStewart Dominic (eds), Corpora in translator education, Manchester, St Jerome, p.15-24.

- Pearson Jennifer, Bowker Lynne, 2002, Working with Specialized Language. A practical guide to usingcorpora, Routledge.

- Sauron Véronique, 2007, «Les nouvelles technologies dans l’enseignement de la traduction: l’exemple de la traduction juridique», dans Lavault Elisabeth. (éd.),Traduction spécialisée: pratiques, théories, formations, Peter Lang, Berne, p.207-224.

- Williams Ian A, 1996, «A translator’s reference needs: dictionairies or parallel texts», dansTarget, n°8, p.277-299.

- Zanettin Federico, 1998, «Bilingual Comparable Corpora and the Training of Translators», in Meta, vol.43, n°4, p.613-630.

- Zanettin Federico, 2001, «Swimming in Words: Corpora, Translation and Language Training», in Aston Guy, (éd.), Learning with Corpora, Athelstan, p.177-197.

- Zanettin Federico, Bernardini Silvia, Stewart Dominic, 2003, Corpora in translator education, Manchester, St Jerome.

Using Corpora in Translation Studies:

- ADAM Jean-Michel, 1997, Les textes: types et prototypes. Récit, description, argumentation, explication et dialogue, Paris, Nathan.

- BAKER Mona, 1993, «Corpus linguistics and translation studies. Implications and Applications», dans BAKER Mona, FRANCIS G., TOGNINI-BONELLI E. (éd.):Text and Technology : In honour of John Sinclair, Amsterdam, John Benjamins, p.17-45.

- BAKER Mona, 1995, «Corpora in translation studies. An overview and some suggestions for future research», dansTarget 7, 1995, p.223-243.

- BIBER Douglas, CONRAD S., REPPEN R., 1998, Corpus linguistics. Investigating language structure and use, Cambridge, Cambridge University Press.

- CHOMSKY Noam, 1957, Syntactic Structures, Berlin, New York, Mouton de Gruyter.

- HABERT Benoît, NAZARENKO Adeline, SALEM André, 1997, Les linguistiques de corpus, Paris, Armand Colin.

- HUNSTON Susan, 2002, Corpora in Applied Linguistics, Cambridge, Cambridge University Press.

- KENNEDY Greame, 1998, An introduction to Corpus Linguistics, Amsterdam, Rodopi.

- KOCIJANCIC POKORN Nike, 2008, «Translation and TS research in a culture using a language of limited diffusion: the case of Slovenia», dansThe journal of specialised translation, 10, p.2-9.

- LAVIOSA Sara (ed.), 1998, «L'approche basée sur le corpus/ The corpus basedapproach», dans Meta 43(4), 1998, p. 474-479.

- LAVIOSA, Sara, 1998, «Core patterns of Lexical Use in a Comparable Corpus of English Narrative Prose», dansMeta 43(4), 1998, p.557-570.

- MALMKJAER Kirstin (éd.), 2004, The Linguistic encyclopaedia, London, Routledge.

- OLOHAN Maeve, 2004, Introducing Corpora in Translation Studies, London, New York, Routledge.

- SCHLAMBERGER BREZAR Mojca, 2005b, «Problem of information in French and Slovene short newspapers articles», dansFormal, functional and typological perspectives on discourse and grammar, Valencia, SLE, 2005, p. 214-215.

Corpus Linguistics:

- Aston, G. (1999) 'Corpus use and learning to translate', Textus,12, 289-313.

- Baker, M. (1999) 'The role of corpora in investigating the linguistic behaviour of translators', International Journal of Corpus Linguistics, 4: 281-298.

- Baroni, Marco & Bernardini, Silvia. 2004. BootCaT: Bootstrapping corpora and terms from the web.Proceedings of LREC 2004.

- Beeby Allison, Rodriguez Inés Patricia and Sánchez-GijónPilar (eds). 2009 Corpus Use and Translating. Amsterdam:Benjamins

- Bowker, Lynne. 2001. ‘Towards a Methodology for a Corpus-Based Approach to Translation Evaluation’. Meta, XLVI, 2,

- Bowker, Lynne and Pearson, Jennifer.2002. Working with SpecialisedLanguage: a Guide to Using Corpora, London: Routledge

- Brown, P., Lai J. & Mercer R.1991: «Aligning sentences in parallel corpora». In 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA. 169-176

- Castagnoli, Sara, Ciobanu, Dragos, Kübler, Natalie, Kunz, Kerstin, Volanschi, Alexandra “Designing a Learner Translator Corpus for Training Purposes”. In N. Kübler. (ed) (2011) Corpora, Language, Teaching, and Resources: From Theory to Practice. Bern: Peter Lang. 221-248.

- Corpas Pastor, Gloria and Seghiri, Myriam (2009) 'Virtual corpora as documentation resources: translating travel insurance documents (English-Spanish)', in A. Beeby, P. Rodriguez Inés and P. Sánchez-Gijón (eds) (2009) Corpus Use and Translating. Amsterdam:Benjamins.

- Frankenberg-Garcia, A. & Santos, D. 2003. Introducing COMPARA, the Portuguese-English parallel translation corpus. In F. Zanettin, S. Bernardini & D. Stewart (eds.) Corpora in Translation Education, Manchester: St. Jerome Publishing. 71-87

- Kübler, Natalie & Aston, Guy. 2010. “Using Corpora in Translation”.in M. McCarthy & A. O'Keefe (eds) The Routledge Handbook of Corpus Linguistics, London: Routledge. 505-515

- Kübler, Natalie 2011. ‘Working with different corpora in translation teaching’. In Ana Frankenberg-Garcia, Lynne Flowerdew, and Guy Aston (eds) New Trends in Corpora and Language Learning. London: Continuum. 62-80

- Kübler, Natalie. 2004. ‘Using Webcorp for building specialized dictionaries’. In Aijmer K. (ed.) Proceedings of the ICAME Conference, May 2002, Göteborg, Suède.Amsterdam:Rodopi.

- Kübler, Natalie. 2003. Corpora and LSP translation. In Zanettin, F., S. Bernardini & D. Stewart (eds), Corpora in Translator Education.Manchester: St Jerome.

- Maia, Belinda (1997). 'Do-it-yourself corpora ... with a little bit of help from your friends!' in Barbara Lewandowska-Tomaszczyk and Patrick James Melia (eds) PALC'97 Practical Applications in Language Corpora. Lodz: Lodz University Press. 403-410.

- Maia, Belinda (2002) ‘Corpora for Terminology Extraction – the Differing Perspectives and Objectives of Researchers, Teachers and Language Services Providers’. In EliaYuste-Rodrigo (ed) Proceedings of the Workshop Language Resources for Translation Work and Reaserch.LREC 2002, Las Palmas de Gran Canaria, Spain.

- Mauranen, Anna. 2007. 'Universal tendencies in translation', in M. Rogers and G. Anderman (eds) Incorporating Corpora: The Linguist and the Translator. Clevedon: Multilingual Matters.

- McEnery, Anthony & Xiao, Richard. 2007. Parallel and comparable corpora: What are they up to? In Incorporating Corpora: Translation and the Linguist (Translating Europe). Multilingual Matters Ltd, Clevedon, UK.

- Olohan, Maeve. 2004. Introducing Corpora in Translation Studies, London:Routledge.

- Pearson, Jennifer. 1998. Terms in Context. Amsterdam: John Benjamins Publishing Company

- Renouf, Antoinette. 2009. Corpus Linguistics beyongGoogle: the WebCorp Linguist’s Search Engine. Digital Studies, vol.1 (1).Consulté le 5 janvier 2013.

- Renouf, Antoinette. 2002.‘WebCorp: providing a renewable data source for corpus linguists.’ Extending the scope of corpus-based research: new applications, new challenges. Ed. S. Granger and S. Petch-Tyson. Atlanta, GA:Rodopi.

- Varantola, K. (2000) “Translators, Dictionaries and Text Corpora”, in Bernardini, S. & Zanettin, F (eds) (2000) pp 117-133.

- Zanettin, Federico (1998). ‘Bilingual Comparable Corpora and the Training of Translators’. Meta, XLIII, 4.

- Zanettin, F., Bernardini, S. and Stewart, D. (eds) (2003) Corpora in Translator Education, Manchester: St. Jerome.

Discourse analysis, Corpora and Translation Studies:

- Arnold, D. J., L.Balkan, S.Meijer, R.L.Humphreys & L.Sadler. 1994. Machine Translation: an Introductory Guide. Londres:Blackwells-NCC.

- Halliday, M.A.K. 1994. Introduction to Functional Grammar.(2ndedition).Londres: Edward Arnold.

- Halliday, M.A.K. & R.Hasan. 1976. Cohesion in English. Londres: Longman.

- Hoey, M. 1983. On the surface of discourse.Londres: George Allen and Unwin.

- Hoey, M. 1991. Patterns of Lexis in Text.Oxford: Oxford University Press.

- Howatt, A.P.R. 1984. A History of English Language Teaching.Oxford: Oxford University Press.

- Hunston, S. 2002. Corpora in Applied Linguistics.Cambridge: CUP.
Kennedy, G. 1998.An introduction to corpus linguistics.Londres: Longman.

- McCarthy, M. & R.Carter. 1994. Language as Discourse. Harlow: Longman.

- Widdowson, H.G. 2007. Discourse analysis. Oxford: OUP.

- Williams, G. (dir.). 2005. La linguistique de corpus. Rennes: Presses Universitaires de Rennes-RUOA.

- Williams, G. 2007. «De l’architecture des sources à l’architecture de l’entrée: le rôle du corpus». In Giovanni,D. 2007. L’Architecture du Dictionnaire Bilingue et le Métier du Lexicographe, Actes du Colloque International de Capitolo-Monopoli, 16-17 avril. Fasano: Schena. 39-53.

- Williams, G. 2006. «La linguistique et le corpus: Une affaire prépositionnelle». Texto, revue de linguistique en ligne.

- Wynne, M. (ed). 2005. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: AHDS.

Corpora and Tools:

- Corpus COMPARA:

- Corpus HANSARD

- Corpus Europarl:

- Corpus Cosmas:

- Corpus MeLLANGE:

- Corpus linguee:

- WebCorp:

- Sketchengine:

- The Web as Corpus: