DelayedMessageTranslation

Enable delayed translation through Message object

Current OpenStack does immediate translation of messages to the local server locale. This proves problematic for two use cases:

As an OpenStack technical support provider, I need to get log messages in my locale so that I can debug and troubleshoot problems.
As an OpenStack API user, I want responses in my locale so that I can interpret the responses.

To solve these issues, we propose enabling delayed translation by creating a new Oslo object that saves away the original text and injected information to be translated at output time. When these messages reach an output boundary, they can be translated into the server locale to mirror today's behavior or to a locale determined by the output mechanism (e.g. log handler or HTTP response writer).

Design

Overview

The current way of translating messages in OpenStack is to immediately run in code strings through a _() translation function, often followed by a format string operation to inject other values into the newly translated string. For example:

logging.warn(_("Starting %s node"), topic)

We want to save the original string ('Starting %s node') and injected information ('topic' variable) so that we can translate them to a specific locale later during execution or perhaps at a different layer. This leads to the ability to utilize the Accept-Language header information at the API request level to translate API response messages. In general, this delayed form of translation also allows us to translate to different locales depending upon the output mechanism. Using this we propose a new log adapter or handler to write log files in specific locales, which covers another use case above.

There is a Launchpad blueprint against Nova covering API translation using the Accept-Language header and a related change: https://blueprints.launchpad.net/nova/+spec/user-locale-api

Implementing Delayed Translation

To implement this delayed translation functionality, it is possible to wrap the returned messages from the current translation function ( _() ) as a Python object that implements most string-like functionality. Also, by overriding the '%' operator and .format function on the object, we can remember and encapsulate the injected information as well. When these 'Message' objects reach an output source on the system, we can easily translate to a default locale, such as the system locale, mirroring functionality today, or translate them to a locale bound to the output mechanism, such as an API or log handler. A quick stub implementation:

def _delayed_translate(message):
    return Message(message)


class Message(object):
    def __init__(self, message):
        self.message = message
        
    def __mod__(self, other):
        self.parameters = other
    
...

The above code is just a small part of the overall Message class that would be used to encapsulate the messages and extra information. Injected parameters would most likely need to be resolved to strings or deep copied to carry them along with the message for later injection via format string after translation.

Problems

Domain Change

While experimenting with this change in Nova, there were quite a few obstacles that were found that would need to be addressed, the first of which has come to be known by some as the 'domain change issue'. This is an issue that occurs due to the OpenStack project's use of gettext.install(…) to install the _() function into Python's __builtin__ namespace. The __builtin__ namespace is shared between all modules running in the current process in Python, including libraries that Nova or another project may be calling into. This leads to the translations in the libraries being incorrect, since the message strings in those libraries will most likely not exist in Nova's translation file. The reverse is also an issue; other libraries can modify the __builtin__ namespace by calling gettext.install(…) themselves (or manipulating the _ function in other ways), causing issues with Nova attempting to translate strings when flow of control returns to Nova. There is a Launchpad bug open against Nova that is meant to address this problem: https://bugs.launchpad.net/nova/+bug/1150194

We were able to fix this problem by removing the gettext.install(…) function in the __init__.py file in the base of every project, and instead import our own wrapper function for the gettext translate function as _ into each module that currently uses the _ function for translating a string. For example in Nova:

# two lines because nova doesn't allow importing functions directly
from nova import gettextutils
_ = gettextutils._

This locks down nova/glance/cinder/etc to always use its own translation domain/files, since it is always using the gettextutils it defines in the base package.

A recent change to remove the gettext.install() from the __init__.py files is viewable here: https://review.openstack.org/#/c/25823/

String Handling

Another issue discovered while experimenting with the delayed translation mechanism in Nova was string handling issues through out the message flow path. Much of the functionality in Nova currently expects translations coming back from _() to be strings which makes it difficult, and in some cases, impossible to pass another python object off as a string-like and make it all the way to the output mechanism. Some functionality that was problematic was code attempting to iterate the Message object, call .split() on the Message object, serialization to JSON and back, and formatting to UTF8 vs ASCII. Most of these problems could be fixed if the problematic code was changed to expect and handle Message objects or just treat the string/Message they are dealing with as a scalar value instead of trying to manipulate it.

Alternative Implemenation

A less invasive workaround is to use a dictionary for reverse look ups of translations in a custom gettextutils module for every translated message. Every call to _() either returns a string directly after recording the translation and initial message, or returns a string object after we record any required parameters to inject in the case of a format string. Custom output handlers (e.g custom log handler) uses the dictionary to look up a possible reverse translation and retrieve the initial string and any extra information. This is then used to re-translate to the locale bound to the log handler and inject the extra information before forwarding the newly translated message to the underlying log handler.

However, this approach has significant drawbacks such as cleaning up the dictionary, thread safety, and the limitation of not being usable across processes that make it unsuitable as a strategic approach.