October 26, 2020
Translation proccess
Translating an IT product into other languages is not an easy task. In this article, we will explain how we localize the Unidata platform and its documentation for English-speaking audiences.

This project has its own unique features, and many of the techniques that can be found on the Internet are not applicable here. The need for translation arose at the end of 2019, when the Unidata Community Edition, the core of the open source Unidata platform, was released. Since the platform is now publicly available, as Community Edition develops, the community needs to provide documentation in English as well.

What are we translating?
  • The Product. A large, complex system that has many screens, each filled with text. The product is changing constantly.
  • The Documentation. 8 main documents, which in total amount to the volume of the novel "The Master and Margarita" (that is, 110 thousand words). Contents of the documents are always growing and evolving, as it follows the product.
Preparing for translation

The first step we took was creating a thesaurus, since it is the basis of the entire translation. The format of the thesaurus gives us:

  • Term list (in the required languages).
  • Term descriptions (in the required languages).
  • Links between related terms.
  • Reflects the context in which the terms are used. If a single term is used in multiple contexts, this is indicated.
  • Term grouping by subject. For example, separate everything that concerns data classification.
  • As a bonus, when compiling the thesaurus, we were able to fully verify that our MDM system complies with the master data management concept described in DAMA-DMBOK.
With the thesaurus ready, we returned to the product itself. At the beginning of the translation process, the product already had an English interface version. We have checked all the texts in English and Russian and corrected all the errors. This led to changes in the Russian documentation, but after all the edits we can safely say that there will be no surprises during our further work.
Work Process

Community Edition core development plans are built in such a way that you can’t just take standard ready-made documentation and translate it. First, we evaluated which content definitely will not change in the near future. Then we determined translation priorities and approximate deadlines for certain parts of the documentation.

Since our project team is small and we do not use single source documentation systems (such as LaTeX and DITA), most of the operations at the current are done manually. Switching to a single source system could solve most arising issues, but the transition and training of employees may take 3 to 4 months.

Taking into account all the above, we have established the following working scheme:

  • A technical writer divides each document into a series of articles describing a particular function of the system. Usually, these are pre-existing sections and subsections. To estimate labor costs, we immediately calculate the volume of articles. Read more about the rating below.
  • A technical writer issues several articles to a less experienced translator.
  • The translator uploads the document to the automated translation system (CAT). A translation memory can be created in that system: a set of previously translated text segments, including translations of terminology and frequently used phrases. This function alone speeds up the translation process and improves its quality.
  • The completed article is then passed to an experienced translator for proofreading. Thus, we are sure that after proofreading, the quality of the text will improve, and all possible mistakes will be corrected. The experienced translator also gives feedback on typical errors, which helps a less experienced specialist to learn and improve.
  • The technical writer receives the translated text. The results are included in both traditional PDF files and online help pages.
  • Our QA engineers and the frontier system users provide feedback on the quality of translation and content, which gives additional opportunities to improve the quality of work.
Labor cost estimation
To estimate the translation time, we use the standard translation standard: 1800 characters (with spaces) of the translated text per hour.

This figure, of course, changes for texts of different genres and complexity. We have a fairly simple technical text, without complicated wording, abbreviations and lyrical digressions, so "average temperature for the given hospital" (i.e. average numbers), as we say in Russia, is quite suitable.

Next, we play a little with numbers, depending on the level of the translator. The figure of 1800 symbols per hour is valid for an experienced specialist. But how can you find out what the norm is for a novice translator without practical observations? We have obtained the following formulas:
  • For an experienced translator, the output rate per working day is: (1800 symb * 8) = 14,400 symb.
  • For a novice translator, the output rate per working day is: (1800 symb * 5) = 9,000 symb.

We took the pessimistic approach that a beginner would translate about 40% less than an experienced one, so we took the daily rate as 5 hours of work for an experienced translator. This also includes spending a beginner's time learning to translate, forming their own method of work, mastering tools, clarifying incomprehensible points, and so on.

On top of this rating, you can also add up to 25% of the time for meetings, thoughts, small urgent tasks, etc.
How can the process be improved?
We studied the experience of organizations involved in localization, and identified several interesting practices that could be suitable for us.
  • Сreating a resource for collaborative translations. There is a practice when open source projects use the help of users in translating the interface and documentation. User engagement will be rewarded. The reward doesn't have to be monetary. This may include additional content, rating, internal currency, and so on. Many wiki sites use this feature.
  • Embedding translation in the development process. Now the product has several release branches, descriptions for which are added after the development of new features. The new approach should change this situation so that documentation is created in two languages at once at the time of release. This may need to be implemented through integration with the CAT system (for example, via the Serge connector). To date, this point leaves us with more questions than answers.
  • Using a single source documentation system. This system will simplify the processes of versioning, tracking changes in 2 localizations, and generating the final format.

Of course, in order to implement new tools and practices, it is necessary to reach a certain critical mass, that is, the moment when current methods become too labor-intensive. The process will be optimized in stages, with increasing demand for bilingual content.

Although we are only at the beginning of the journey, we tried to share our experience and thoughts. There is still a lot to analyze, implement new tools, and learn how to solve new problems. You can follow the development of the Unidata Community Edition on our website. We will publish news at the end of important stages of work on the product.
Daniko I.