Translation Infrastructure workflow and setup

The OpenStack Infrastructure team uses a series of scripts to interact with Transifex in order to manage translations changes. This document explains the infrastructure workflow and scripts used to accomplish this.

Transifex information

List of all projects where translations are setup: https://www.transifex.com/organization/openstack

Projects

Transifex has a separate project for each OpenStack project with two exceptions:

The openstack-manuals Transifex project is used for all documentation projects: operations-guide, ha-guide, security-guide, openstack-manuals, api-site. See Documentation Projects below.
The horizon transifex project is used for horizon and django_openstack_auth. See Horizon below.

For each Transifex project there are several resources (and those are only visible for people logged into Transifex), for example Barbican - like all Python projects - has these resources:

barbican-log-critical-translations: Translations of LOG.critical messages
barbican-log-error-translations: Translations of LOG.error messages
barbican-log-info-translations: Translations of LOG.info messages
barbican-log-warning-translations: Translations of LOG.warn messages
barbican-translations: Translations of the "normal" messages

Workflow

All Python projects are setup the same way, so let's use $PROJECT here. $PROJECT is the name of the repository.

The files live in $PROJECT repository directory $PROJECT/locale/

The process is as follows :

A patch to $PROJECT merges to git (this applies to each and every patch)
A post job called $PROJECT-upstream-translation-update is run that calls project-config/jenkins/scripts/upstream_translation_update.sh
The script extracts any string that is marked for translation and updates the pot files in $PROJECT/locale/
The script sends these changes to Transifex

Now the translation process starts, translators write translations for each resource of the projected. Translators can also review the translations in Transifex. Every morning at 6:00 UTC our periodic scripts are executed:

The job $PROJECT-propose-translation-update is executed, it checks out the repository and calls the script project-config/jenkins/propose_translation_update.sh.
The script connects to Transifex and asks to download for each resource all "sufficiently translated" files. We define "sufficiently" as at least 75 per cent of strings are translated.
The script generates the pot files for the project since the pot files are outdated (upstream_translation_update does not touch the git repo)
The script merges translations and pot files
The script removes translation files that have no strings anymore or are less than 20 per cent translated (See: Translation percentage changes)
The script filters the translated files and does some heuristics to not update too often (for example, if a project has no update for translated files, then it does not update the pot files)
If there are changes in the repository, a patch is proposed to the project
The patch goes the usual review process in Gerrit with an exception: Most projects will not wait for a second core and check it in if it passes all tests. There's no review of translations done by the cores, the review is done by translators in Transifex. The cores just review that the overall content looks fine.
If there are troubles with the patch, a core should send a message to the i18n team and tell them about the problem. Then somebody needs to investigate whether translations or tools are broken.

Since the periodic scripts are run every morning (UTC), the usual updates happen:

the scripts check whether there is an old patch and re-use the ChangeID so that it's a new patchset
any voting for the patch gets reset (so, +1, -1, WIP and +2 - a -2 is sticky)
One caveat: If a patch is approved and in the gate, then no new patch gets proposed. We sometimes have very long queues (longer than 25 hours) and without this test, a patch might never go in since every 24 hours a new patch gets proposed and then it takes time for approving it.
If a patch gets abandoned by the cores, a new patch will get proposed if there is new content.

Projects that are sufficiently translated, like Nova, get nearly every day a new patch proposed. Projects that have no translations yet, like Swift, shouldn't get any patches proposed.

Marked for translation

Strings are marked in Python code using the gettext conventions (see also oslo.i18n docu):

_("some string") - is a translatable string, e.g. print (_("translate me"))
For logging, it's LOG.info(_LI("info message")) - and there's also _LE for LOG.error, _LW for LOG.warning, _LC for LOG.critical
Normal strings like print("Do not translate") are not translated

Translation percentage changes

All translations are stored in Transifex, the copies in $PROJECT are just copies. For consistency, we want a translation file that is sufficiently translated, there's no sense if one out of 5 strings is translated and the rest is in English. Therefore, we only download files that are at least 75 per cent translated and if files grow over time but don't get new translations (or strings change too much), we remove them again from the project with a threshold of 20 percent. This reduces the amount of patches that go into a project. Without these changes, each day a large patch would go in - now they are much smaller...

Horizon

Horizon and django_openstack_auth use their own scripts (propose_translation_update_django_openstack_auth.sh, propose_translation_update_horizon.sh, upstream_translation_django_openstack_auth.sh, upstream_translation_horizon.sh) for the translation process which basically do the same as mentioned above. They use a common Transifex project called horizon.

Documentation projects

For documentation projects, the Transifex project openstack-manuals is used. It currently has 28 resources: One for each different manual. It is used by the OpenStack repositories api-site, ha-guide, openstack-manuals, operations-guide, and security-guide.

The process is the same as for the Python projects but separate scripts are used (propose_translation_update_manuals.sh and upstream_translation_update_manuals.sh). Documentation files do not mark strings, all strings are extracted. We have special scripts that extract all strings from the DocBook files and place them in pot files. The extraction is done by tools/generatepot which lives in each documentation repository. There is also a script openstack-generate-docbook (created by os_doc_tools/handle_pot.py when you run `python setup.py install`) that is used to merge strings back into DocBook XML.

All documentation projects with translations have also a separate Jenkins check which builds the localized manuals - and these are also published on docs.openstack.org.

Note that for downloading manuals, we also use the 75 per cent rule with the exception of two targets: The glossary (openstack-manuals/doc/glossary) and common (openstack-manuals/doc/common) directories. Since these are large and only a part is used, the limit is 8 per cent.

String Freeze

Note that string freeze is for developers. The goal is to give the translators a chance to catch up and not have strings changing all the time. There's no enforcement of string freeze in our infrastructure.

Translations/Infrastructure

Contents