Proposal for translation workflows for both code source and documentation source
Set up a translation process for both code source and documentation source
OpenStack Manuals are in DocBook format. The source is on GitHub: http://github.com/openstack/openstack-manuals Launchpad and Transifex are free web based tools used for crowd translation. Both of them provide a simple web interface in which non-technical people can help translation. They don't support DocBook format, but support the popular GNU Gettext file formats (PO Template or PO).
In order to translate OpenStack Manuals to multiple languages, which are in DocBook format, we can slice the documents into short statements, then use a web based translation management tool to manage the translation process, and finally converge the translated content into a new copy of DocBook.
Here are the five steps of the translation process: Step #1 Slicing - extract translatable content from DocBooks and generate Gettext compatible POT files (PO Template or PO); Step #2 Uploading - upload the POT (or PO) files to a web based translation management tool; Step #3 Downloading - download PO (or MO) files from the web tool after translation and review; Step #4 Converging - converge the translated contents into new copies of DocBook, create DocBooks in multiple languages Step #5 Generating - generate HTML/PDF in multiple languages from DocBooks in multiple languages
The picture in the attachment describes these steps. (See attached file: DocBook translation process.png)
Compare of Launchpad and Transifex
Launchpad (https://launchpad.net/) and Transifex (https://www.transifex.net/) are similar web based tools used for crowd translation. The goal of the compare is to find the most appropriate tool for this scenario. The compare are made between Launchpad and Transifex free version for open sources. (Refer to https://www.transifex.net/plans/ to get details of “Transifex free version for open sources”)
After considering the requirements for manuals translation, below perspectives are taking into consideration:
- Supported format
- DocBook slicing support
- Converging support
- Source uploading method
- Output downloading method
- Translation Memory support
- Translation history support
- Change management
- Translation Dictionary
Refer to Table 1 for detail information of the compare.
Another important measurement to compare is the workload. Having the five steps in the process execute automatically as much as possible will decrease the workload of translation coordinators. Refer to Table 2 for the detail of workload compare when using Launchpad or Transifex for DocBook translation.
Here are the conclusions after the compare, (1) the workload using Transifex is similar with using Launchpad. (2) The advantages of Launchpad are:
- Leverage the same user id and user group of developers, users, translators of Gettext strings.
- Leverage the same contribution calculating method "Karma", with fixing bugs, answering questions and Gettext strings translation.
(3) The advantage of Transifex is better translation memory support. The disadvantage of Transifex is having different user registration and user interfaces. Both the translators and the coordinators need to register in a new website and get familiar with a new user interfaces before translation.
Based on these analysis, I think, using Launchpad to do the manuals translation is a good choice.
- Translation Dictionary
Translation Dictionary here means terminology translation. It is very helpful to ensure the translation quality. Unfortunately, both Launchpad and Transifex don't support Translation Dictionary. I suggest to use wiki pages to document the terminology translation for translators reference. Here is a sample wiki page for Eclipse globalization: http://wiki.eclipse.org/French_Glossary.
- Change Management
Launchpad and Transifex support the synchronize of old PO files and new PO files in their own ways. They will compare the new po and the existing po and handle the changes automatically. But new PO files won't be generated automatically after DocBooks are changed. Translation coordinators need to generate new PO files by running a Python program manually. I will suggest to develop a program in future, to monitor the update of manuals GitHub repository. When a DocBook is updated, a new PO file will be generated and synchronized with the old one in the Launchpad server.
- Machine translation
Is it necessary to include machine translation? Machine translation can be executed before human beings review. Then translators won't need to translate from scratch. Translators can review the result of machine translation and correct them. But after investigation, I found the quality of free machine translations, which have API exported, are not so good. I doubt whether a poor quality machine translation is helpful. Anyway, if most of the community members want to include machine translation, it is possible to improve the slicing program, to generate a PO file with the results of machine translation.