On the usefulnessof parallel corpora for contrastive linguistics.A multivariate corpus study of meaning shifts in the semantic field of inchoativity.

This study wants to explore to what extent parallel corpora are apt tobe used in contrastive linguistic research. Under the assumption of semantic stability in the translational process – translators are thought to transferthe meaning of the source text– contrastive researchers often use parallel corpora (i.e. corpora containing source texts and their translations) to study similarities and differences in system and usage on all kinds of linguistic levels[1], [2], [3], [4]. Contrastive linguists (like [5]) already drew attention to the potential impact of the translational process on formalcharacteristics of translated texts (e.g. in terms of normalization and shining through [6] – thereby reducing the representativeness of translations for the language under study, but the question whether translation impacts on the meaning of target-text words and constructions has rarely been asked.

If it can be shown that the translational process causes (subtle) semantic shifts in the target language, this would undermine one of the core assumptions in using parallel corpora in contrastive linguistics, namelythe assumption that there is a perfect semantic equivalence between source texts in language A and target texts in language B.

Therefore, in this study we want to investigate the impact of source texts in a given language on the meaning of target-text words and constructions. To do so, we will focus on the meaning structure of 5 Dutch lexemes in the semantic field of inchoativity (“beginnen”, “starten”, “van start gaan”, “opstarten” and “aanvatten”). More particularly, the meaning structure of these lexemes in a parallel corpus of English-to-Dutch translations will be compared to that in a comparable corpus of authentic Dutch texts. Both corpora are included in the Dutch Parallel Corpus [7], a 10-million-word corpus of Dutch, French and English, consisting of different genres. The selection of the lexemes was done by means of the semantic mirroring procedure [8] and after these lexemes were extracted from the corpus, the behavioral-profile method was adopted [9], [10]. This usage-based method is based on the distributional semantics idea, i.e. one can grasp the meaning of a word by looking at its (linguistic) context, and can be used to measure semantic differences between closely related words (the more similar the linguistic context is, the more semantically similar two words are). In particular, we annotated the linguistic context of each retrieved lexeme for a variety of so-called ID-tags, such as animacy and concreteness of the subject and object, mode of the verb, object type, semantic category of the modified verb, presence of a modifying verb,… These ID-tags, taken together, represent the syntactic and semantic architecture of each individual lexeme, which consequently enables us – by means of various multivariate statistical techniques (such as correspondence analysis, cluster analysis and classification trees) - to find out in which respects each lexeme is unique and whether the architecture of each lexeme remains stable when it is based on translational data (parallel corpus) compared to authentic data (comparable corpus). The results indeed show that meaning shifts take place during translation, i.e. translation affects the semantic characteristics of words (and constructions). As a consequence, we would advise contrastive linguists to always compare their research findings retrieved from a parallel corpus to the results in a comparable corpus.

References

[1] Altenberg, B., & Granger, S. (Eds.). (2002). Lexis in contrast: corpus-based approaches (Vol. 7). John Benjamins Publishing.

[2] Granger, S. (2003). The corpus approach: a common way forward for Contrastive Linguistics and Translation Studies. Corpus-based approaches to contrastive linguistics and translation studies, 17-29.

[3] Ebeling, J., Ebeling, S. O., & Hasselgård, H. (2013). Using recurrent wordcombinations to explore cross-linguistic differences. Advances in Corpus-Based Contrastive Linguistics: Studies in Honour of Stig Johansson, 54, 177.

[4] Viberg, Å. (2012). Language-specific meanings in contrast: A corpus-based contrastive study of Swedish få ‘get’.

[5] Johansson, S. (2007). Seeing through Multilingual Corpora: On the use of corpora in contrastive studies (Vol. 26). John Benjamins Publishing.

[6] Vandepitte, S., & De Sutter, G. (2013). Contrastive Linguistics and Translation Studies. Handbook of Translation Studies, 4, 36.

[7] Macken, L., De Clercq, O., & Paulussen, H. (2011). "Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus." Meta 56 (2).

[8] Vandevoorde, L., De Sutter, G., and Plevoets, K. (2016).“On Semantic Differences between Translated and Non-Translated Dutch. Using Bidirectional Parallel Corpus Data for Measuring and Visualizing Distances between Lexemes in the Semantic Field of Inceptiveness.” Empirical Translation Studies. Interdisciplinary Methodologies Explored, 128–46.

[9] Gries, S. T., & Divjak, D. (2009). Behavioral profiles: a corpus-based approach to cognitive semantic analysis. New directions in cognitive linguistics, 57–75.

[10]Szymor, N. (2015). Behavioral Profiling in Translation Studies. trans-kom, 8(2), 483–498.