Proposal for translation workflows for both code source and documentation source
Set up a translation process for both code source and documentation source
OpenStack Manuals are in DocBook format. The source is on GitHub: http://github.com/openstack/openstack-manuals. Launchpad and Transifex are free web based tools used for crowd translation. Both of them provide a simple web interface in which non-technical people can help translation. They don't support DocBook format, but support the popular GNU Gettext file formats (PO Template or PO).
In order to translate OpenStack Manuals to multiple languages, which are in DocBook format, we can slice the documents into short statements, then use a web based translation management tool to manage the translation process, and finally converge the translated content into a new copy of DocBook.
Here are the five steps of the translation process:
- Slicing - extract translatable content from DocBooks and generate Gettext compatible POT files (PO Template or PO);
- Uploading - upload the POT (or PO) files to a web based translation management tool;
- Downloading - download PO (or MO) files from the web tool after translation and review;
- Converging - converge the translated contents into new copies of DocBook, create DocBooks in multiple languages
- Generating - generate HTML/PDF in multiple languages from DocBooks in multiple languages
The picture in the attachment describes these steps. (See attached file: DocBook translation process.png)
Compare of Launchpad and Transifex
Launchpad and Transifex are similar web based tools used for crowd translation. The goal of the compare is to find the most appropriate tool for this scenario. The compare are made between Launchpad and Transifex free version for open sources. (Refer to https://www.transifex.net/plans/ to get details of “Transifex free version for open sources”)
After considering the requirements for manuals translation, below perspectives are taking into consideration:
- Supported format
- DocBook slicing support
- Converging support
- Source uploading method
- Output downloading method
- Translation Memory support
- Translation history support
- Change management
- Terminology / Glossary
Refer to Table 1 for detail information of the compare.
Another important measurement to compare is the workload. Having the five steps in the process execute automatically as much as possible will decrease the workload of translation coordinators. Refer to Table 2 for the detail of workload compare when using Launchpad or Transifex for DocBook translation.
Here are the conclusions after the compare, (1) the workload using Transifex is similar with using Launchpad. (2) The advantages of Launchpad are:
- Leverage the same user id and user group of developers, users, translators of Gettext strings.
- Leverage the same contribution calculating method "Karma", with fixing bugs, answering questions and Gettext strings translation.
(3) The advantage of Transifex is better translation memory support. The disadvantage of Transifex is having different user registration and user interfaces. Both the translators and the coordinators need to register in a new website and get familiar with a new user interfaces before translation.
Based on these analysis, I think, using Launchpad to do the manuals translation is a good choice.
- Translation Dictionary
Translation Dictionary here means terminology translation. It is very helpful to ensure the translation quality. Unfortunately, both Launchpad and Transifex don't support Translation Dictionary. I suggest to use wiki pages to document the terminology translation for translators reference. Here is a sample wiki page for Eclipse globalization: http://wiki.eclipse.org/French_Glossary.
- Change Management
Launchpad and Transifex support the synchronize of old PO files and new PO files in their own ways. They will compare the new po and the existing po and handle the changes automatically. But new PO files won't be generated automatically after DocBooks are changed. Translation coordinators need to generate new PO files by running a Python program manually. I will suggest to develop a program in future, to monitor the update of manuals GitHub repository. When a DocBook is updated, a new PO file will be generated and synchronized with the old one in the Launchpad server.
- Machine translation
Is it necessary to include machine translation? Machine translation can be executed before human beings review. Then translators won't need to translate from scratch. Translators can review the result of machine translation and correct them. But after investigation, I found the quality of free machine translations, which have API exported, are not so good. I doubt whether a poor quality machine translation is helpful. Anyway, if most of the community members want to include machine translation, it is possible to improve the slicing program, to generate a PO file with the results of machine translation.
Table 1 Compare of Launchpad, Transifex and Pootle
|Supported format||pot file (.pot), po file (.po)|
|DocBook Slicing support||No|
|Source uploading method||Two methods:a> Automatic template imports from Bazaar branch;b> Manually upload template (or an archive) through Launchpad's web interface.|
|Output downloading method||Two methods:a> Automatic save output files to Bazaar branch;b> Manually download output files through web interface.|
|Translation Memory support||The exact same translation items in other projects can be listed as a reference.|
|Translation history support||Yes|
|Change management||Launchpad will automatically update its data every time you push a new revision to the Bazaar branch.|
Table 2 Workload compare when using Launchpad, Transifex or Pootle for DocBook translation
|Step 1: Slicing||Python program  can be used to slice all the DocBook together in one command|
|Step 2: Uploading||If the source code is synchronized with Bazaar, the uploading can be automatically handled by Launchpad.|
|Step 3: Downloading||Launchpad can commit daily snapshots of the translations to a Bazaar branch in a specific folder.|
|Step 4: Converging||Python program  can be used to coverge all the po files back to DocBooks|
|Step 5: Generating||Maven command can be used to generate HTML/PDF from DocBooks|
 The Python program can be written based on “xml2po” to slice all DocBooks of the manuals project to translatable strings in batch. “xml2po” is an existing Python program in GNOME gnome-doc-utils package which can extracts translatable content from free-form XML documents and outputs gettext compatible POT files.
 The Python program can be written based on “xml2po”, to converge the translated strings back to copies of DocBooks in batch.