Translations/Infrastructure

= Translation Infrastructure workflow and setup =

The OpenStack Infrastructure team uses a series of scripts to interact with Zanata, a translation platform in order to manage translations changes. This document explains the infrastructure workflow and scripts used to accomplish this.

These scripts are run after each change of a project to push changes to Zanata. Every day at 6:00 UTC, the "OpenStack Proposal Bot" imports these changes with subject "Imported Translations from Zanata".

For more information about translations, see also the translations page and the i18n team page.

Zanata information
List of all projects where translations are setup: https://translate.openstack.org/project/list

Projects are setup from projects.yaml for repositories where the "translate" option is set.

Affected files
All the files in the $PROJECT/locale are handled by these scripts. Only for an initial translation setup or in case of problems, a manual patch for these files is needed.

Projects
Zanata has a separate project for each OpenStack project.

For each python project in Zanata there are several resources,  for example Barbican - like all Python projects - has these documents:
 * barbican-log-critical-translations: Translations of LOG.critical messages
 * barbican-log-error-translations: Translations of LOG.error messages
 * barbican-log-info-translations: Translations of LOG.info messages
 * barbican-log-warning-translations: Translations of LOG.warn messages
 * barbican-translations: Translations of the "normal" messages

UI projects like horizon and documentation projects have different documents to translate.

Translation percentage
All translations are stored in Zanata, the copies in $PROJECT are just copies.For consistency, we want a translation file that is sufficiently translated, there's no sense if one out of 5 strings is translated and the rest is in English. Therefore, we defined type of percentages with I18n team, and the percentage values are described in http://docs.openstack.org/developer/i18n/infra.html#translation-jobs.

Workflow
All Python projects are setup the same way, so let's use $PROJECT here. $PROJECT is the name of the repository.

The files live in $PROJECT repository directory $PROJECT/locale/

The process is as follows :
 * 1) A patch to $PROJECT merges to git (this applies to each and every patch)
 * 2) A post job called $PROJECT-upstream-translation-update is run that calls project-config/jenkins/scripts/upstream_translation_update.sh
 * 3) The script extracts any string that is marked for translation and updates the pot files in $PROJECT/locale/
 * 4) The script sends these changes to the Zanata instance

Now the translation process starts, translators write translations for each resource of the projected. Translators can also review the translations in Zanata. Every morning at 6:00 UTC our periodic scripts are executed:


 * 1) The job $PROJECT-propose-translation-update is executed, it checks out the repository and calls the script project-config/jenkins/propose_translation_update.sh.
 * 2) The script connects to Zanata and asks to download for each resource all new "sufficiently translated" files. We define "sufficiently" as at least X percent of strings are translated.
 * 3) The script connects to Zanata and asks to download updates for each existing resource.
 * 4) The script generates the pot files for the project since the pot files are outdated (upstream_translation_update does not touch the git repo)
 * 5) The script merges translations and pot files
 * 6) The script removes translation files that have no strings anymore or are less than Y percent translated
 * 7) The script filters the translated files and does some heuristics to not update too often (for example, if a project has no update for translated files, then it does not update the pot files)
 * 8) The script removes location information and empty strings from the language files.
 * 9) If there are changes in the repository, a patch is proposed to the project
 * 10) The patch goes the usual review process in Gerrit with an exception: Most projects will not wait for a second core and check it in if it passes all tests. There's no review of translations done by the cores, the review is done by translators in Zanata. The cores just review that the overall content looks fine.
 * 11) If there are troubles with the patch, a core should send a message to the i18n team and tell them about the problem. Then somebody needs to investigate whether translations or tools are broken.

Since the periodic scripts are run every morning (UTC), the usual updates happen:
 * the scripts check whether there is an old patch and re-use the ChangeID so that it's a new patchset
 * any voting for the patch gets reset (so, +1, -1, WIP and +2 - a -2 is sticky)
 * One caveat: If a patch is approved and in the gate, then no new patch gets proposed. We sometimes have very long queues (longer than 25 hours) and without this test, a patch might never go in since every 24 hours a new patch gets proposed and then it takes time for approving it.
 * If a patch gets abandoned by the cores, a new patch will get proposed if there is new content.

Projects that are sufficiently translated, like Nova, get nearly every day a new patch proposed. Projects that have no translations yet, like Swift, shouldn't get any patches proposed.

Note that only the *.pot files contain location information like filename and linenumber and push these to Zanata. The translation files (*.po) do not have the location information and do not have empty strings in there. If you need those, either download the full po file from Zanata or run

Marked for translation
Strings are marked in Python code using the gettext conventions (see also oslo.i18n docu):
 * - is a translatable string, e.g.
 * For logging, it's  - and there's also   for ,   for  ,   for
 * Normal strings like  are not translated

Horizon
The translation process is basically the same as mentioned above, except that installation of Horizon is needed to extract strings for Django projects. More details are described in the common script part: project-config/jenkins/common_translation_update.sh.

Documentation projects
For documentation repositories like api-site, ha-guide, openstack-manuals, operations-guide, and security-guide, a separate project is used as well.

The process is the same as for the Python projects with the following differences:
 * Since documentation files do not mark strings, all strings are extracted for translation.
 * Document files live in sub-folders like $PROJECT/doc and $PROJECT/install-guide. PO files generated from RSTs are stored in source/locale/ sub-folder.

Such differences are managed by the scripts mentioned above with the common script part: project-config/jenkins/common_translation_update.sh.

All documentation projects with translations have also a separate Jenkins check which builds the localized manuals - and these are also published on docs.openstack.org.

For building localized documents, the script `doc-tools-check-languages` is used, it lives in the openstack-doc-tools repository. It is configured via `doc-tools-check-languages.conf`. The syncing scripts to our translation server also parse this file.

In `doc-tools-check-languages.config`, the array `SPECIAL_BOOKS` is used by the syncing scripts. The other values are used by the `doc-tools-check-languages`:
 * `skip`: Do not send content of this directory to translation server
 * `RST`: This directory is using RST as document format and therefore needs to be treated differently than the DocBook XML files.

String Freeze
Note that string freeze is for developers. The goal is to give the translators a chance to catch up and not have strings changing all the time. There's no enforcement of string freeze in our infrastructure.

Release handling
Note that the following information is current for the Liberty release (October 2015).

Challenge: How to translate the release branch while the master branch is already open for new releases? Proposal: Create stable branches for all projects that have current translations with main files (so, not only LOG files).

Goal:
 * Updated translations of files that are sufficiently translate (> X %)
 * Updated po source file that reflect state of repository
 * No empty/mainly untranslated language files (< Z % is rule of thumb)

Project list
The following projects have translations and branches and will get a stable branch:

List of projects that branch with translations:
 * openstack/aodh
 * openstack/ceilometer
 * openstack/cinder
 * openstack/designate-dashboard
 * openstack/django_openstack_auth
 * openstack/glance
 * openstack/heat
 * openstack/horizon
 * openstack/keystone
 * openstack/neutron
 * openstack/nova
 * openstack/swift
 * openstack/zaqar

List of projects branching later:
 * openstack/openstack-manuals - to translate Install Guide

Not translated projects
List of projects that branch without translations in git :
 * openstack/designate
 * openstack/ironic-inspector
 * openstack/magnum
 * openstack/magnum-ui
 * openstack/manila
 * openstack/sahara
 * openstack/searchlight

List of projects that branch with only LOG translations in git:
 * openstack/barbican
 * openstack/glance_store
 * openstack/ironic
 * openstack/trove

List of projects not branching:
 * openstack/api-site
 * openstack/ha-guide
 * openstack/operations-guide
 * openstack/security-doc

List of libraries and client, we're not covering them in separate branches for now:
 * openstack/oslo.cache
 * openstack/oslo.concurrency
 * openstack/oslo.db
 * openstack/oslo.i18n
 * openstack/oslo.log
 * openstack/oslo.messaging
 * openstack/oslo.middleware
 * openstack/oslo.policy
 * openstack/oslo.reports
 * openstack/oslo.service
 * openstack/oslo.utils
 * openstack/oslo.versionedobjects
 * openstack/oslo.vmware
 * openstack/python-magnumclient
 * openstack/python-openstackclient

Process

 * 1) . Before RC1 is cut, translators work with master version.
 * 2) . When stable branch is cut, branches get created in Zanata as well. Syncing will get enabled for them.
 * 3) . Translators focus on translation of stable branch.
 * 4) . Once translators are finished with translations, a copy of new translations from stable branch to master is done in Zanata.
 * 5) . Open master version to accept translations.

Note: Around RC1 time, a patch to cleanup "old" translations gets proposed to both stable and master to remove any translations in git that are less than Z % translated (we normally only delete less than Y % to avoid to frequent deletion).

New proposal
The following proposal needs buy-in from stable maintainer team and I18n team:


 * 1) Before RC1 is cut, translators work with master version.
 * 2) When stable branch is cut, branches get created in Zanata as well. Syncing to projects will get enabled for them.
 * 3) Translators focus on translation of stable branch. Master branch is made read-only.
 * 4) The release manager will cut a RC2 including the current state of translators work, RC2 is normally cut 10 days after RC1.
 * 5) Once RC2 is created, a copy of new translations from stable branch to master is done in Zanata.
 * 6) Master branch is made writeable and translation can now be done for both stable branch and master.
 * 7) Translations of stable branch will continue to be synced and translators can continue translation.
 * 8) Briefly before RC1 of next stable release is cut, the stable branch in Zanata is closed. So, there is always a single stable branch (latest release) and the master to translate on.

Note that after the RC2 merge, there is no automatic copying done from stable branch to master. Translation teams need to translate strings in both master and stable, if they choose to continue translating stable. If a merge is needed, it needs to be requested by the translation team for a specific project.

Note that stable branches will only be created for repositories that have already at least *one* translation of the main file. Repositories with only translated log files or with no translations at all are not branched. We consider a file translated if at least X percent of its messages are translated.

Note: Around RC1 time, a patch to clean up "old" translation files gets proposed to master - and stable branch if already cut - to remove any translations in git that are less than Y % translated (the infra jobs only deletes files from git less than Y % to avoid to frequent deletion).