Software Translation Process using Alchemy Catalyst
Version 2.0
Kris KniazFebruary 9, 2006
Copyright © by Kris Kniaz
Translation Process
Contents
No table of contents entries found.
Revision History
Version / Date / Author(s) / Description1 / 12/5/2005 / Kris Kniaz / Initial structure.
2 / 2/9/2006 / Kris Kniaz / Resource file translation scenarios added
Filename: Translation Process.doc
Introduction
Translation is a crucial part of the software localization process. Every organization which wants to provide global software or internet services must deal with this aspect of the international rollout. In the complex environment of the software project this relatively simple task (if the application is properly designed and built for localization) causes additional dependencies and productivity issues for the team:
1) Translations should be done in context, that is: a language specialist who is able to fully understand how the text appears on the screen in the context of the user action invariably will do a better translation.
2) Applications (especially those originally written for the English market) are very often not ready for the translation: sentences are broken into separate strings put together by code instructions, strings are reused in different context (for example fax as a noun and fax as a verb) and application functional content is not separated from the code (for example error messages are often located as string consts or variables in the code)
3) Translation must integrate with standard engineering build, test and deployment processes. Functional content such as names of buttons, menu options, error messages etc … obviously requires translation but it is connected to code (and very often is the code). In Microsoft.NET localized program strings should be contained in the resource files (a.k.a resx files). Those files are compiled with the code and deployed as satellite assemblies by the engineers during the build process.
Most organizations start from inefficient, “manual” processes that usually involve exporting functional strings to excel spreadsheets or word files and then importing translated files manually or through scripts.
Mature organizations implement translation and localization management software from one of the leading vendors such as Trados, Idiom Technologies or Alchemy.
The translation software uses pattern of Translation Memory, which is defined as “collection of units of associated text strings in language pairs from previous translations which can be suggested to translators translating similar content and language pair document.”
Management of the localization process revolves around storing, indexing and reusing Translation Memories. In the typical case translation engineer creates a project for base (neutral) language that contains base to base TM’s. Subsequently the base project is then used to create base to target projects. As a result one typically ends up with the collection of parent to children TM projects.
This paper presents typical translation processes for the .NET application implemented using alchemy Catalyst product from Alchemy software (www.alchemy.ie).
Overview of the Catalyst product
Main features of the Catalyst are summarized below:
Feature / ExplanationTM’s storage / In Catalyst Translation Memories are stored in the project file (extension ttk). This has benefits and drawbacks: because one can easily share TM information by simply emailing the project, on the other hand lack of centralized repository (like a data base offered by other vendors) creates file management overhead.
Comparison Expert / The Comparison Expert is used to compare two application files. It detects missing, added and modified resources and records these changes in the Results Toolbar and an optional XML report file. This Expert is useful in determining the scope of change between revisions of software.
Leveraging / Alchemy CATALYST provides Translation Memory technology called ezMatch™. ezMatch allows translators to re-use previous translations. The Leverage Expert guides users through using ezMatch technology and is designed to maximize the amount of translations that can be leveraged from multiple file formats.
Pseudo Translation / Pseudo Translation simulates the effects of translation on the application files. It does this by substituting vowel characters in the source files with diacritical or accented characters. Pseudo-Translation can be used early during product development cycle to determine if an application can be translated easily. For example, one may use it to determine if a product crashes if a series of strings are translated, or if a series of strings will fit if during translation they expand 15%
Validation / Validation automates the detection of common localization errors normally introduced during the translation process. The Validate Expert also has a companion technology, Runtime Validation Expert, which allows user to validate applications as they run on the Windows desktop.
Power Translation / Power Translation is used to automate the lookup and translation of translation units in the active Project TTK file. It operates on translation units and helps the translator locate and translate matching terms in active translations memories.
Spell checking / Custom dictionaries
Data Base Connector / Ability to create TM’s via reading content directly from data base through a provided connector
License Management / Either through additional License Manager or through built in ability to borrow license for up to 2wks.
ezParse / Ability to define custom XML schemas for non standard resource files
Command line interface / Most of the features of Catalyst can be called from the command line. This feature could be used during the build process.
Free translation version / Catalyst has a free, stripped down version of its software. This version could be used by translators that do not require advanced features such as leveraging, power translating etc…. The free version reduces the overall TCO for the product.
Creation of Master Translation Project file
The first task in the localization process is creation of the base application – this may or may not be “real” (that is deployed or even deployable) application especially if your website offers different features to different user profiles. You should create you base application using neutral culture (using Microsoft.NET localization terminology); a decent compromise is English locale:
Figure 1
In this process (seeFigure 1 ) resource files (resx and custom xml) are imported into the Catalyst. For the custom localization files (custom XML schema etc…) the import rules must be entered into Catalyst via the easyParse interface before the import.
There are two options of importing files: file import and file/folder import. The latter option is preferable because it preserves the structure of the source directories and Catalyst is able to replicate the source file structure during export.
Figure 2 shows the initial view of the project after import. Catalyst parses each file and extracts strings. Even though the system treats each string pair as an independent unit of work (Translation Memory) it remembers the relationship between strings, parent files and folders. This is shown in the navigational area on the left.
The initial project settings specify English as a source and target languages. This project needs to be treated as a master for all other projects. In case of a new country rollout we will use it as a basis for translation to a specific language.
Figure 2
New country rollout
First step in the new country rollout process is the creation of the Catalyst project file for the specific TM pairing. The source language should always be left as your basic locale (for instance en-GB), the target language needs to be set in the locale navigational area.
The Project file name should reflect specific language pair. Therefore our naming convention should be: [project actual name].[Source locale].[Target locale].ttk.
For example:
· Website.en-GB.en-GB.ttk is a master file for UK code base
· Website.en-GB.de-DE.ttk contains translation into German
The locale specific project file is shipped to translators (via quickship option which essentially creates a self-extracting executable of the catalyst project file) who work on the file using the translators/lite version of the system. Before the file is shipped to translators the translation process owner should “lock” certain strings i.e. prevent them from being translated. Generally this should be rare; however it particularly applies in cases where resource files are used to store business logic.
Figure 3
After receiving the translated file from the translator a senior language resource should approve all translations within the tool (which would change the visual status for every string from en eye to checkmark) and give it back to engineers.
Figure 4
In the last part of the process locale specific resx files are extracted and included in the next build. Alternatively the approval of the translated strings should happen after the development-stage or development-integration (depending on your process) build has been approved. After the build the language QA takes place on the translation testing environment. In case of issues, defects are entered and the translation process repeats.
After launch you must maintain the locale specific TM’s – most appropriately in the SCM system since they represent our reference data. This puts an overhead of keeping the TTK files in sync between the master and localized versions. You should automate this process using Alchemy’s command-line interface
Parallel translation process
For very large websites or applications the process of translation will not use single unit but it will require chunks of work that could be handled by several translators at the same time. Catalyst supports this scenario with the “section” concept. The idea is to export parts of the site into “subprojects” to be sent to translators. After translation the sections must be imported into the main projects and section projects should be discarded
Figure 5
Please note that original names (and extensions) of resource file that were loaded into the master project will be preserved in the localized projects. This not a big problem because files with correct names and extensions can be always exported from that file. Alternatively we could maintain a neutral version of resx file (with no language extension) and use it as master. Figure 5 shows the parallel translation process in detail. Depending on the project timeline we should export the main project into as many section projects as it appropriate. Those section projects are handled by translators and translator managers. Translation QA of the parallel translation process should follow the main workflow illustrated on the Figure 4. Your translation build schedule should be aligned with delivery of translated chunks.
Figure 6 and Figure 7 show screens for importing and exporting sections from Catalyst.
Figure 6
Figure 7
Maintenance – adding new page or section
In this scenario we are assuming that there is an existing master translation project as well as one or several projects containing TMs for live non-english sites. As it was mentioned before we need to maintain the master and localized versions in sync. Therefore we would add a new folder/ new page.en-GB.resx file to all ttk projects at the same time and start parallel translation efforts (via exporting sections) when appropriate and congruent with the release schedule.
Maintenance – Adding new string to existing resource file
Catalyst manages translation of the application TM’s. It does not manage application strings themselves; therefore one cannot add or remove strings inside Catalyst. If we need to add a new application string to the resource file we need to follow a slightly different process illustrated on the Figure 8. A developer needs remove the old resx file from the master project and re-import the new version. Subsequently all localized projects need to be recreated (by changing the target language and saving master with localized names according to our naming convention). Engineers need to then leverage (see Figure 9) translated TM’s from the old localized projects into new versions. Non-translated strings need to be then exported to translators and after receiving the translated TM’s re-imported into the new projects. Finally local resx files must be exported, application rebuilt and QAd.
Figure 8
Figure 9
Admittedly the overhead of this process is huge, therefore we should add some number of “spare” strings to every resource file to avoid this.
Issues with Using Alchemy in Translation Projects
No software tool is perfect and we have to deal with limitations and compromises made by the designers and architects. Alchemy is no exception so when working with it be mindful of few pitfalls:
1. Leverage once: The biggest efficiency in applying Alchemy on projects is its ability to reuse translation memories in the process called “leveraging”. Once certain key phrases or dictionaries are translated the resulting TMs could be used to translate the same phrases occurring in other documents. In this way we can achieve both significant efficiency increases and consistency in translating the terms. Alchemy supports this process well; unfortunately if the rework of “base” is required the model breaks because in Alchemy leveraging can be done only once. Hence changes in base require that you discard already translated sections and leverage everything from the beginning
2. Workflow history is often lost: the translation process is workflow driven so preserving the history of who translated and approved particular TM is an important feature. It appears that the workflow history is maintained by the Alchemy data based so when merging sections back into the bigger data base the individual history if often lost.
3. Alchemy data bases can grow big: Translated TMs should be kept in the SCM system as a “trusted” source. This works relatively well for applications with limited amount of strings. For applications or websites with large amounts of strings the Alchemy data bases grow to many MBs thus presenting logistical challenges. A centralized data base driven system might do a better job.
10