- 1 Translation Infrastructure workflow and setup
Translation Infrastructure workflow and setup
The OpenStack Infrastructure team uses a series of scripts to interact with Zanata in order to manage translations changes. This document explains the infrastructure workflow and scripts used to accomplish this.
These scripts are run after each change of a project to push changes to Zanata. Every day at 6:00 UTC, the "OpenStack Proposal Bot" imports these changes with subject "Imported Translations from Zanata".
List of all projects where translations are setup: https://translate.openstack.org/project/list
Projects are setup from projects.yaml for repositories where the "translate" option is set.
All the files in the $PROJECT/locale are handled by these scripts. Only for an initial translation setup or in case of problems, a manual patch for these files is needed.
Zanata has a separate project for each OpenStack project.
For each python project in Zanata there are several resources, for example Barbican - like all Python projects - has these documents:
- barbican-log-critical-translations: Translations of LOG.critical messages
- barbican-log-error-translations: Translations of LOG.error messages
- barbican-log-info-translations: Translations of LOG.info messages
- barbican-log-warning-translations: Translations of LOG.warn messages
- barbican-translations: Translations of the "normal" messages
UI projects like horizon and documentation projects have different documents to translate.
All translations are stored in Zanata, the copies in $PROJECT are just copies.For consistency, we want a translation file that is sufficiently translated, there's no sense if one out of 5 strings is translated and the rest is in English. Therefore, we defined three type of percentages:
- only download new files that are at least X percent translated and if files grow over time
- However if there is no new translations (or strings change too much), we remove them again from the project with a threshold of Y percent. This reduces the amount of patches that go into a project. Without these changes, each day a large patch would go in.
- there is also a lower threshold for releases of Z percent of messages translated as policy - which is only manually enforced.
X, Y, and Z values can be found on http://docs.openstack.org/developer/i18n/infra.html#translation-jobs (On discussion. Currently X is 75, Y is 40, and Z is 66.)
All Python projects are setup the same way, so let's use $PROJECT here. $PROJECT is the name of the repository.
The files live in $PROJECT repository directory $PROJECT/locale/
The process is as follows :
- A patch to $PROJECT merges to git (this applies to each and every patch)
- A post job called $PROJECT-upstream-translation-update is run that calls project-config/jenkins/scripts/upstream_translation_update.sh
- The script extracts any string that is marked for translation and updates the pot files in $PROJECT/locale/
- The script sends these changes to the Zanata instance
Now the translation process starts, translators write translations for each resource of the projected. Translators can also review the translations in Zanata. Every morning at 6:00 UTC our periodic scripts are executed:
- The job $PROJECT-propose-translation-update is executed, it checks out the repository and calls the script project-config/jenkins/propose_translation_update.sh.
- The script connects to Zanata and asks to download for each resource all new "sufficiently translated" files. We define "sufficiently" as at least X percent of strings are translated.
- The script connects to Zanata and asks to download updates for each existing resource.
- The script generates the pot files for the project since the pot files are outdated (upstream_translation_update does not touch the git repo)
- The script merges translations and pot files
- The script removes translation files that have no strings anymore or are less than Y percent translated
- The script filters the translated files and does some heuristics to not update too often (for example, if a project has no update for translated files, then it does not update the pot files)
- The script removes location information and empty strings from the language files.
- If there are changes in the repository, a patch is proposed to the project
- The patch goes the usual review process in Gerrit with an exception: Most projects will not wait for a second core and check it in if it passes all tests. There's no review of translations done by the cores, the review is done by translators in Zanata. The cores just review that the overall content looks fine.
- If there are troubles with the patch, a core should send a message to the i18n team and tell them about the problem. Then somebody needs to investigate whether translations or tools are broken.
Since the periodic scripts are run every morning (UTC), the usual updates happen:
- the scripts check whether there is an old patch and re-use the ChangeID so that it's a new patchset
- any voting for the patch gets reset (so, +1, -1, WIP and +2 - a -2 is sticky)
- One caveat: If a patch is approved and in the gate, then no new patch gets proposed. We sometimes have very long queues (longer than 25 hours) and without this test, a patch might never go in since every 24 hours a new patch gets proposed and then it takes time for approving it.
- If a patch gets abandoned by the cores, a new patch will get proposed if there is new content.
Projects that are sufficiently translated, like Nova, get nearly every day a new patch proposed. Projects that have no translations yet, like Swift, shouldn't get any patches proposed.
Note that only the *.pot files contain location information like filename and linenumber and push these to Zanata. The translation files (*.po) do not have the location information and do not have empty strings in there. If you need those, either download the full po file from Zanata or run
msgmerge POT-FILE PO-FILE -o FULL-PO-FILE
Marked for translation
Strings are marked in Python code using the gettext conventions (see also oslo.i18n docu):
_("some string")- is a translatable string, e.g.
print (_("translate me"))
- For logging, it's
LOG.info(_LI("info message"))- and there's also
- Normal strings like
print("Do not translate")are not translated
Horizon and django_openstack_auth use their own scripts (propose_translation_update_django_openstack_auth.sh, propose_translation_update_horizon.sh, upstream_translation_django_openstack_auth.sh, upstream_translation_horizon.sh) for the translation process which basically do the same as mentioned above.
For documentation repositories like api-site, ha-guide, openstack-manausl, operations-guide, and security-guide, a separate project is used as well.
The process is the same as for the Python projects but separate scripts are used (propose_translation_update_manuals.sh and upstream_translation_update_manuals.sh). Documentation files do not mark strings, all strings are extracted. We have special scripts that extract all strings from the DocBook and RST files and place them in pot files. The extraction is done by tools/generatepot which lives in each documentation repository. There is also a script openstack-generate-docbook (created by os_doc_tools/handle_pot.py when you run `python setup.py install`) that is used to merge strings back into DocBook XML.
All documentation projects with translations have also a separate Jenkins check which builds the localized manuals - and these are also published on docs.openstack.org.
For building localized documents, the script `doc-tools-check-languages` is used, it lives in the openstack-doc-tools repository. It is configured via `doc-tools-check-languages.conf`. The syncing scripts to our translation server also parse this file.
In `doc-tools-check-languages.config`, the array `SPECIAL_BOOKS` is used by the syncing scripts. The other values are used by the `doc-tools-check-languages`:
- `skip`: Do not send content of this directory to translation server
- `RST`: This directory is using RST as document format and therefore needs to be treated differently than the DocBook XML files.
Note that string freeze is for developers. The goal is to give the translators a chance to catch up and not have strings changing all the time. There's no enforcement of string freeze in our infrastructure.
Note that the following information is current for the Liberty release (October 2015).
Challenge: How to translate the release branch while the master branch is already open for new releases?
Proposal: Create stable branches for all projects that have current translations with main files (so, not only LOG files).
- Updated translations of files that are sufficiently translate (> X %)
- Updated po source file that reflect state of repository
- No empty/mainly untranslated language files (< Z % is rule of thumb)
The following projects have translations and branches and will get a stable branch:
List of projects that branch with translations:
List of projects branching later:
- openstack/openstack-manuals - to translate Install Guide
Not translated projects
List of projects that branch without translations in git :
List of projects that branch with only LOG translations in git:
List of projects not branching:
List of libraries and client, we're not covering them in separate branches for now:
- . Before RC1 is cut, translators work with master version.
- . When stable branch is cut, branches get created in Zanata as well. Syncing will get enabled for them.
- . Translators focus on translation of stable branch.
- . Once translators are finished with translations, a copy of new translations from stable branch to master is done in Zanata .
- . Open master version to accept translations.
Note: Around RC1 time, a patch to cleanup "old" translations gets proposed to both stable and master to remove any translations in git that are less than Z % translated (we normally only delete less than Y % to avoid to frequent deletion).
The following proposal needs buy-in from stable maintainer team and I18n team:
- Before RC1 is cut, translators work with master version.
- When stable branch is cut, branches get created in Zanata as well. Syncing to projects will get enabled for them.
- Translators focus on translation of stable branch. Master branch is made read-only.
- The release manager will cut a RC2 including the current state of translators work, RC2 is normally cut 10 days after RC1.
- Once RC2 is created, a copy of new translations from stable branch to master is done in Zanata.
- Master branch is made writeable and translation can now be done for both stable branch and master.
- Translations of stable branch will continue to be synced and translators can continue translation.
- Briefly before RC1 of next stable release is cut, the stable branch in Zanata is closed. So, there is always a single stable branch (latest release) and the master to translate on.
Note that after the RC2 merge, there is no automatic copying done from stable branch to master. Translation teams need to translate strings in both master and stable, if they choose to continue translating stable. If a merge is needed, it needs to be requested by the translation team for a specific project.
Note that stable branches will only be created for repositories that have already at least *one* translation of the main file. Repositories with only translated log files or with no translations at all are not branched. We consider a file translated if at least X percent of its messages are translated.
Note: Around RC1 time, a patch to clean up "old" translation files gets proposed to master - and stable branch if already cut - to remove any translations in git that are less than Y % translated (the infra jobs only deletes files from git less than Y % to avoid to frequent deletion).