Jump to: navigation, search

Difference between revisions of "Translations"

(Integrating translations into your project)
 
(45 intermediate revisions by 18 users not shown)
Line 1: Line 1:
__NOTOC__
+
''Note: We switched in September 2015 to using [https://translate.openstack.org/ Zanata] for the Liberty cycle. Please direct questions to the  [http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-i18n openstack-i18n mailing list] and update the documentation below to fully explain Zanata.''
= Proposal for translation workflows for both code source and documentation source =
 
  
== Goal ==
+
= Translation, Internationalization and Localization in OpenStack =
Set up a translation process for both code source and documentation source
 
  
== Background ==
+
OpenStack is committed to broad international support, and as such there must be
[[OpenStack]] Manuals are in [[DocBook]] format. The source is on [[GitHub]]: http://github.com/openstack/openstack-manuals.
+
an ongoing concern with making OpenStack usable for all audiences. This includes
Launchpad and Transifex are free web based tools used for crowd translation. Both of them provide a simple web interface in which non-technical people can help translation. They don't support [[DocBook]] format, but support the popular GNU Gettext file formats (PO Template or PO).
+
proper use of internationalization and localization tools by developers, and
 +
high-quality translations for both user-facing messages and documentation.
  
== Translation Process ==
+
== Translation & Management ==
In order to translate [[OpenStack]] Manuals to multiple languages, which are in [[DocBook]] format, we can slice the documents into short statements, then use a web based translation management tool to manage the translation process, and finally converge the translated content into a new copy of [[DocBook]].
 
  
Here are the five steps of the translation process:
+
Let's start with a working definition: translation is the act of taking the
# Slicing - extract translatable content from [[DocBooks]] and generate Gettext compatible POT files (PO Template or PO);
+
written materials in one language and converting them into another language in
# Uploading - upload the POT (or PO) files to a web based translation management tool;
+
the most meaningful way possible. In terms of OpenStack, translation happens on
# Downloading - download PO (or MO) files from the web tool after translation and review;
+
both the written documentation and on strings marked for translation in the
# Converging - converge the translated contents into new copies of [[DocBook]], create [[DocBooks]] in multiple languages
+
projects' codebases.
# Generating - generate HTML/PDF in multiple languages from [[DocBooks]] in multiple languages
 
  
The picture in the attachment describes these steps.
+
''NOTE: information on how to prepare your code or documentation for translation, see the section on internationalization below.''
(See attached file: [[DocBook]] translation process.png)
 
  
== Compare of Launchpad and Transifex ==
+
=== Zanata ===
[https://launchpad.net Launchpad] and [https://www.transifex.net Transifex] are similar web based tools used for crowd translation. The goal of the compare is to find the most appropriate tool for this scenario. The compare are made between Launchpad and Transifex free version for open sources. (Refer to https://www.transifex.net/plans/ to get details of “Transifex free version for open sources”)
 
  
After considering the requirements for manuals translation,  below perspectives are taking into consideration:
+
OpenStack is using a Zanata instance running at [https://translate.openstack.org/ https://translate.openstack.org/] as translation management platform.
* Supported format
 
* [[DocBook]] slicing support
 
* Converging support
 
* Source uploading method
 
* Output downloading method
 
* Translation Memory support
 
* Translation history support
 
* Change management
 
* Terminology / Glossary
 
Refer to Table 1 for detail information of the compare.
 
  
Another important measurement to compare is the workload. Having the five steps in the process execute automatically as much as possible will decrease the workload of translation coordinators.
+
=== Downloading translation files ===
Refer to Table 2 for the detail of workload compare when using Launchpad or Transifex for [[DocBook]] translation.
 
  
Here are the conclusions after the compare,
+
If you wish to download the translation files (.po files) you can do so by
(1) the workload using Transifex is similar with using Launchpad.
+
selecting the language you're interested in, then clicking on the name of
(2) The advantages of Launchpad are:
+
the project resource you wish to download. In the modal dialog which appears,
* Leverage the same user id and user group of developers, users, translators of Gettext strings.
+
you can use any of the "download" options depending on your use case.
* Leverage the same contribution calculating method "Karma", with fixing bugs, answering questions and Gettext strings translation.
 
(3) The advantage of Transifex is better translation memory support.
 
The disadvantage of Transifex is having different user registration and user interfaces. Both the translators and the coordinators need to register in a new website and get familiar with a new user interfaces before translation.
 
  
Based on these analysis, I think, using Launchpad to do the manuals translation is a good choice.
+
=== Translating on the site ===
  
== Other considerations ==
+
Translation is most efficiently done right on Zanata's site. You don't
* Translation Dictionary
+
need to download any files or applications to get started.
Translation Dictionary here means terminology translation. It is very helpful to ensure the translation quality. Unfortunately, both Launchpad and Transifex don't support Translation Dictionary. I suggest to use wiki pages to document the terminology translation for translators reference.
 
Here is a sample wiki page for Eclipse globalization: http://wiki.eclipse.org/French_Glossary.
 
  
* Change Management
+
If your language already exists, select a project, select the "master" version, select your language and then select a document to start translating.
Launchpad and Transifex support the synchronize of old PO files and new PO files in their own ways. They will compare the new po and the existing po and handle the changes automatically. But new PO files won't be generated automatically after [[DocBooks]] are changed. Translation coordinators need to generate new PO files by running a Python program manually.
 
I will suggest to develop a program in future, to monitor the update of manuals [[GitHub]] repository. When a [[DocBook]] is updated, a new PO file will be generated and synchronized with the old one in the Launchpad server.
 
  
* Machine translation
+
=== Downloading translation files ===
Is it necessary to include machine translation?  Machine translation can be executed before human beings review. Then translators won't need to translate from scratch. Translators can review the result of machine translation and correct them.
 
But after investigation, I found the quality of free machine translations, which have API exported, are not so good. I doubt whether a poor quality machine translation is helpful.
 
Anyway, if most of the community members want to include machine translation, it is possible to improve the slicing program, to generate a PO file with the results of machine translation.
 
  
[[attachment:DocBook translation process.png]]
+
You can also download translation files and translate locally.
  
== Reference ==
+
'''TODO: Explain exactly how this is done with Zanata.'''
=== Table 1 Compare of Launchpad, Transifex and Pootle ===
 
{| border="1" cellpadding="2" cellspacing="0"
 
|                 
 
|  Launchpad
 
|-
 
| Supported format
 
|  pot file (.pot), po file (.po)
 
|}
 
  
=== Table 2 Workload compare when using Launchpad, Transifex or Pootle for [[DocBook]] translation ===
+
== Release cycle ==
 +
 
 +
One of the most challenging aspects of managing translations in an Open Source
 +
project is handling the interplay between translators and developers during
 +
the release cycle. The key piece of this equation is the "string freeze".
 +
 
 +
=== String Freeze ===
 +
 
 +
''NOTE: OpenStack's string freeze happens at the close of the final milestone in the development cycle, giving translators the entire RC period to update translations.''
 +
 
 +
At a predefined time during the release cycle there will be a "string freeze",
 +
which means that after this point strings marked for translation in the
 +
codebase can no longer be changed except in the case of critical-priority bugs.
 +
 
 +
Once the string freeze is in effect, the translation files in Zanata can
 +
be assumed to be static, and translation efforts should happen in full force.
 +
This is not to say that translation can't happen all the time. But during the
 +
development process strings may change and translation efforts may end up being
 +
wasted.
 +
 
 +
Any changes during the RC period should be carefully vetted to ensure they
 +
do not alter or add translation strings, or else coordinated with translators
 +
to ensure that changes are handled appropriately.
 +
 
 +
Check out http://docs.openstack.org/project-team-guide/release-management.html for more details.
 +
 
 +
=== Re-incorporating Translations ===
 +
 
 +
The OpenStack Infrastructure team has set up automatic generation of reviews
 +
for translations so that they can be re-incorporated with minimal effort
 +
at any time. For each project where this is setup in our CI infrastructure, every day a job is run. This job regenerates the original pot file and imports all well enough translated files and then proposes them to the project as patches. Only files that have 75 per cent or more translated strings are downloaded.
 +
 
 +
The list of current open proposed imports is available at [https://review.openstack.org/#/q/status:open++branch:master+topic:transifex/translations,n,z review.openstack.org].
 +
 
 +
Most importantly though, immediately prior to the release of each Release
 +
Candidate, and before cutting the Final Release for each version, the
 +
translation files should be merged back into their respective projects to
 +
make sure they are properly distributed with the release.
 +
 
 +
At present it is the responsibility of each project's PTL or appointed
 +
translation manager to make sure this happens, though OpenStack's release
 +
managers, translation team coordinators, etc. are also encouraged to help
 +
ensure that this happens smoothly.
 +
 
 +
=== Stable Releases and Backports ===
 +
 
 +
At present, changes to translations will not be backported to stable release
 +
branches. Doing so would require maintaining wholly separate copies of each
 +
set of translations and massively increases the burden on translators.
 +
 
 +
== Internationalization (i18n) ==
 +
 
 +
The term internationalization is used to broadly describe coding practices
 +
that allow software to be adapted to the linguistic and technical differences
 +
of various regions. This includes practices such as marking strings for
 +
translation, supporting non-ASCII character sets, etc.
 +
 
 +
=== Integrating translations into your project ===
 +
 
 +
==== Python Projects (General) ====
 +
 
 +
For most of the OpenStack core projects (and any that use Python), the preferred
 +
tools for internationalization are [http://docs.python.org/library/gettext.html gettext] and [http://babel.edgewall.org/ babel] (Debian/Ubuntu package name ''python-pybabel''). Getting started is
 +
pretty easy:
 +
 
 +
===== Adopt oslo.i18n =====
 +
First step is to adopt oslo.i18n in your project - [http://docs.openstack.org/developer/oslo.i18n/usage.html How to Use oslo.i18n in Your Application or Library]
 +
 
 +
===== Extract messages =====
 +
Once you have some [http://docs.openstack.org/developer/oslo.i18n/guidelines.html#choosing-a-marker-function messages to translate], we need to extract those messages using Babel. The easiest way is to run "python setup.py extract_messages" in say the py27 venv.
 +
 
 +
Configure your project to use Babel to easily create your translation files. First, add `Babel` to your requirements.txt file (or wherever you track dependencies). Second, create a `babel.cfg` file in the root of your project; at it's simplest it can just contain this line:
 +
  <pre><nowiki>
 +
[python: **.py]
 +
</nowiki></pre>
 +
 
 +
Finally, add the following to your `setup.cfg` file:
 +
<pre><nowiki>
 +
[extract_messages]
 +
keywords = _ gettext ngettext l_ lazy_gettext
 +
mapping_file = babel.cfg
 +
output_file = <project name>/locale/<project name>.pot
 +
</nowiki></pre>
 +
 
 +
That will allow you to run `python setup.py extract_messages` and have it automatically generate the base translation resource file for your project.
 +
 
 +
Now you are ready to merge the generated files into your project (see example [https://review.openstack.org/#/c/182848/ review]). Note that an initial file needs to be imported into your project for the scripts that interact with the translation site.
 +
 
 +
===== Setup Zanata server, import and export of translations =====
 +
 
 +
Now you are ready to setup Zanata and the CI infrastructure. Read the [https://docs.openstack.org/infra/manual/creators.html#enabling-translation-infrastructure Infra manual] on how to do it.
 +
 
 +
==== Horizon (Django) ====
 +
 
 +
Django has built-in internationalization tools that go well-beyond the basics
 +
of `gettext` to ensure proper unicode support throughout the entire codebase
 +
and to make advanced features more accessible. As such, Horizon uses Django's
 +
family of `ugettext` functions from `django.utils.translation`. It is
 +
preferrable to explicitly import the translation function you wish to use:
 +
 
 +
 
 +
<pre><nowiki>#!highlight python
 +
from django.utils.translation import ugettext, ugettext_lazy  # ..., etc.
 +
</nowiki></pre>
 +
 
 +
 
 +
For more information on the internationalization tools Django makes available,
 +
see the [https://docs.djangoproject.com/en/dev/topics/i18n/ Django i18n Docs].
 +
 
 +
==== Documentation ([[DocBook]]) ====
 +
 
 +
While developer documentation for projects can generally be maintained solely in
 +
English, user-oriented documentation such as that produced and maintained by
 +
OpenStack's Docs team is also a high-priority for translation. This includes
 +
installation and administration manuals.
 +
 
 +
''NOTE: For the first release this does not include API documentation. Typically these are sourced in the `openstack-manuals` project.''
 +
 
 +
For specifics on translation of OpenStack Documentation, please refer to the
 +
[[Documentation/Translation]].
 +
 
 +
==== What To Translate ====
 +
 
 +
At present the convention is to translate ''all'' user-facing strings. This means
 +
API messages, CLI responses, documentation, help text, etc.
 +
 
 +
See [[LoggingStandards#Log_Translation]] for information about translating log messages.
 +
 
 +
Exception text should ''not'' be marked for translation, because if an exception
 +
occurs there is no guarantee that the translation machinery will be functional.
 +
 
 +
== Localization (L10n) ==
 +
 
 +
The term localization is used more specifically than internationalization to
 +
cover coding practices that allow a software's input and output characteristics
 +
to adjust to variances in style from region to region. This includes things
 +
like number and date formatting, especially.
 +
 
 +
=== Dates, Numbers, and Other Concerns ===
 +
 
 +
Going beyond What is accomplished by Internationalization, the most important
 +
aspect to consider is regional differences in formatting for dates and numbers.
 +
For example::
 +
 
 +
 
 +
<pre><nowiki>
 +
Dates:
 +
    04/01/2012 == April 1st, 2012 (US)
 +
    04/01/2012 == January 4th, 2012 (UK)
 +
 
 +
Numbers:
 +
    1,000.42 == One thousand and 42 hundredths (US)
 +
    1.000,42 == One thousand and 42 hundredths (EU)
 +
</nowiki></pre>
 +
 
 +
 
 +
Accepting any format and naively passing it into our code would horribly break
 +
things. Accepting only one format leaves out large chunks of the world.
 +
Therefore, we use localization tools to accept these formats and normalize
 +
them into data structures Python can handle universally on input, and to
 +
convert them back to the user's expected format for display.
 +
 
 +
Another less common (for OpenStack) issue related to localization revolves
 +
around name formats, which vary culturally. The western style of "first name"
 +
and "last name" doesn't fit for many cultural naming conventions. This isn't
 +
something a software tool can account for, so for problems such as these the
 +
best solution is to simply accept the broadest range of inputs (e.g. a single
 +
"name" field).
 +
 
 +
=== How To Localize Your Project ===
 +
 
 +
==== Horizon (Django) ====
 +
 
 +
Horizon has excellent localization tools available since it is built on top of
 +
the Django web framework. Most conversions happen automatically when the
 +
localization framework is active. Full support for a localized user dashboard
 +
experience is a high-priority feature.
 +
 
 +
==== Other [[OpenStack]] Projects ====
 +
 
 +
Python's `locale` and `gettext` modules offer most of the tools necessary
 +
to localize a Python project with some effort. More information on this will
 +
be added in the future.
 +
 
 +
 
 +
== Translation infrastructure ==
 +
The translation infrastructure and workflow is documented on the [[Translations/Infrastructure]] page.
 +
 
 +
----
 +
[[Category:I18n]]

Latest revision as of 07:27, 23 June 2017

Note: We switched in September 2015 to using Zanata for the Liberty cycle. Please direct questions to the openstack-i18n mailing list and update the documentation below to fully explain Zanata.

Translation, Internationalization and Localization in OpenStack

OpenStack is committed to broad international support, and as such there must be an ongoing concern with making OpenStack usable for all audiences. This includes proper use of internationalization and localization tools by developers, and high-quality translations for both user-facing messages and documentation.

Translation & Management

Let's start with a working definition: translation is the act of taking the written materials in one language and converting them into another language in the most meaningful way possible. In terms of OpenStack, translation happens on both the written documentation and on strings marked for translation in the projects' codebases.

NOTE: information on how to prepare your code or documentation for translation, see the section on internationalization below.

Zanata

OpenStack is using a Zanata instance running at https://translate.openstack.org/ as translation management platform.

Downloading translation files

If you wish to download the translation files (.po files) you can do so by selecting the language you're interested in, then clicking on the name of the project resource you wish to download. In the modal dialog which appears, you can use any of the "download" options depending on your use case.

Translating on the site

Translation is most efficiently done right on Zanata's site. You don't need to download any files or applications to get started.

If your language already exists, select a project, select the "master" version, select your language and then select a document to start translating.

Downloading translation files

You can also download translation files and translate locally.

TODO: Explain exactly how this is done with Zanata.

Release cycle

One of the most challenging aspects of managing translations in an Open Source project is handling the interplay between translators and developers during the release cycle. The key piece of this equation is the "string freeze".

String Freeze

NOTE: OpenStack's string freeze happens at the close of the final milestone in the development cycle, giving translators the entire RC period to update translations.

At a predefined time during the release cycle there will be a "string freeze", which means that after this point strings marked for translation in the codebase can no longer be changed except in the case of critical-priority bugs.

Once the string freeze is in effect, the translation files in Zanata can be assumed to be static, and translation efforts should happen in full force. This is not to say that translation can't happen all the time. But during the development process strings may change and translation efforts may end up being wasted.

Any changes during the RC period should be carefully vetted to ensure they do not alter or add translation strings, or else coordinated with translators to ensure that changes are handled appropriately.

Check out http://docs.openstack.org/project-team-guide/release-management.html for more details.

Re-incorporating Translations

The OpenStack Infrastructure team has set up automatic generation of reviews for translations so that they can be re-incorporated with minimal effort at any time. For each project where this is setup in our CI infrastructure, every day a job is run. This job regenerates the original pot file and imports all well enough translated files and then proposes them to the project as patches. Only files that have 75 per cent or more translated strings are downloaded.

The list of current open proposed imports is available at review.openstack.org.

Most importantly though, immediately prior to the release of each Release Candidate, and before cutting the Final Release for each version, the translation files should be merged back into their respective projects to make sure they are properly distributed with the release.

At present it is the responsibility of each project's PTL or appointed translation manager to make sure this happens, though OpenStack's release managers, translation team coordinators, etc. are also encouraged to help ensure that this happens smoothly.

Stable Releases and Backports

At present, changes to translations will not be backported to stable release branches. Doing so would require maintaining wholly separate copies of each set of translations and massively increases the burden on translators.

Internationalization (i18n)

The term internationalization is used to broadly describe coding practices that allow software to be adapted to the linguistic and technical differences of various regions. This includes practices such as marking strings for translation, supporting non-ASCII character sets, etc.

Integrating translations into your project

Python Projects (General)

For most of the OpenStack core projects (and any that use Python), the preferred tools for internationalization are gettext and babel (Debian/Ubuntu package name python-pybabel). Getting started is pretty easy:

Adopt oslo.i18n

First step is to adopt oslo.i18n in your project - How to Use oslo.i18n in Your Application or Library

Extract messages

Once you have some messages to translate, we need to extract those messages using Babel. The easiest way is to run "python setup.py extract_messages" in say the py27 venv.

Configure your project to use Babel to easily create your translation files. First, add `Babel` to your requirements.txt file (or wherever you track dependencies). Second, create a `babel.cfg` file in the root of your project; at it's simplest it can just contain this line:

[python: **.py]

Finally, add the following to your `setup.cfg` file:

[extract_messages]
keywords = _ gettext ngettext l_ lazy_gettext
mapping_file = babel.cfg
output_file = <project name>/locale/<project name>.pot

That will allow you to run `python setup.py extract_messages` and have it automatically generate the base translation resource file for your project.

Now you are ready to merge the generated files into your project (see example review). Note that an initial file needs to be imported into your project for the scripts that interact with the translation site.

Setup Zanata server, import and export of translations

Now you are ready to setup Zanata and the CI infrastructure. Read the Infra manual on how to do it.

Horizon (Django)

Django has built-in internationalization tools that go well-beyond the basics of `gettext` to ensure proper unicode support throughout the entire codebase and to make advanced features more accessible. As such, Horizon uses Django's family of `ugettext` functions from `django.utils.translation`. It is preferrable to explicitly import the translation function you wish to use:


#!highlight python
from django.utils.translation import ugettext, ugettext_lazy  # ..., etc.


For more information on the internationalization tools Django makes available, see the Django i18n Docs.

Documentation (DocBook)

While developer documentation for projects can generally be maintained solely in English, user-oriented documentation such as that produced and maintained by OpenStack's Docs team is also a high-priority for translation. This includes installation and administration manuals.

NOTE: For the first release this does not include API documentation. Typically these are sourced in the `openstack-manuals` project.

For specifics on translation of OpenStack Documentation, please refer to the Documentation/Translation.

What To Translate

At present the convention is to translate all user-facing strings. This means API messages, CLI responses, documentation, help text, etc.

See LoggingStandards#Log_Translation for information about translating log messages.

Exception text should not be marked for translation, because if an exception occurs there is no guarantee that the translation machinery will be functional.

Localization (L10n)

The term localization is used more specifically than internationalization to cover coding practices that allow a software's input and output characteristics to adjust to variances in style from region to region. This includes things like number and date formatting, especially.

Dates, Numbers, and Other Concerns

Going beyond What is accomplished by Internationalization, the most important aspect to consider is regional differences in formatting for dates and numbers. For example::


Dates:
    04/01/2012 == April 1st, 2012 (US)
    04/01/2012 == January 4th, 2012 (UK)

Numbers:
    1,000.42 == One thousand and 42 hundredths (US)
    1.000,42 == One thousand and 42 hundredths (EU)


Accepting any format and naively passing it into our code would horribly break things. Accepting only one format leaves out large chunks of the world. Therefore, we use localization tools to accept these formats and normalize them into data structures Python can handle universally on input, and to convert them back to the user's expected format for display.

Another less common (for OpenStack) issue related to localization revolves around name formats, which vary culturally. The western style of "first name" and "last name" doesn't fit for many cultural naming conventions. This isn't something a software tool can account for, so for problems such as these the best solution is to simply accept the broadest range of inputs (e.g. a single "name" field).

How To Localize Your Project

Horizon (Django)

Horizon has excellent localization tools available since it is built on top of the Django web framework. Most conversions happen automatically when the localization framework is active. Full support for a localized user dashboard experience is a high-priority feature.

Other OpenStack Projects

Python's `locale` and `gettext` modules offer most of the tools necessary to localize a Python project with some effort. More information on this will be added in the future.


Translation infrastructure

The translation infrastructure and workflow is documented on the Translations/Infrastructure page.