Open Source i18n

Not so long ago I found myself in the rather embarrassing position of having authored a translation collaboration tool for use by an international user base without having implemented the platform’s (Plone) internationalization (i18n) facilities. That is, I gave the application an English interface and left it at that. Furthermore I began to consider how I might extend this application to enable the more professional users to leverage their existing translation tools, like the industry-standard Trados, rather than being confined to using my application’s Web interface when actually performing a translation (even if a cut-and-paste workaround is available). Correcting the first problem and laying the groundwork for solving the second entailed giving myself a pretty extensive lesson in both open standards and Open Source software for Computer Aided Translation (CAT), which I summarize below.

Translating software interfaces

First, I discovered that there are two aspects of what I’m currently trying to achieve with this tool, both of which have different, but related, technological solutions. First of all, I need translations for the interface of the tool itself. To accomplish this, Plone makes somewhat customized use of a the GNU gettext architecture for i18n of applications. In that model, translations are stored in message catalogs which match original translations with their target language translations contained and which are contained in files with the .po format, which essentially maintains translations through original/translation key-value pairs. In the case of Plone, a utility called i18ndude handles much of the heavy lifing involved in preparing these files for use with your Plone Product, while a PO file editor like POEdit helps you actually perform the translations.

The gettext/PO file approach appears to be the most common solution for translating the interfaces for Linux-based open source projects as well as Web applications (though there are some sui-generis exceptions, such as the Globalize plugin for Ruby on Rails).

So I went ahead and created a Spanish translation for the application, but its current users include translators of French and Portuguese as well, and potentially Italian, Swahili, Greek, and Arabic. I can’t translate the interface into those languages (though I might be able to create a passable Portuguese interface), and I’d like to make it as easy as possible for those who can to help me out. A solution to this problem might be to set up a server tool like Pootle, which lets you visualize progress made on translation of PO files, to help manage the project. Ubuntu/Launchpad’s Rosetta, which is not (yet) open source, appears to be a more sophisticated implementation of the same idea. In conjunction with the server app, I could provide a brief overview of client-side tools that can be used to edit the PO files. I have already mentioned POEdit, but OmegaT, a more ambitious suite of Open Source CAT tools, a also has support for PO files.

Translating documents

So, software interface translation is the first order of translation business for my project. The second issue is the translation of documents, which is the purpose of the application itself. Take the documentation of a software program, for example, with an original document in, say, HTML. A translator could simply translate it free-form, using copy-and-paste and whatever tools she happens to have available. But translation-specific editors like that provided by OmegaT–which supports HTML, OpenOffice, and ODF among several other formats–can help present the text in a way that’s more comfortable for translation, in addition to providing access to Translation Memory tools. As long as the program is capable of exporting the translation to the format of the original (which is presumably what would be reqeusted in most translation jobs), this would appear to be the preferable route.

The problem with this process is that translators are likely to receive documents in any number of formats–the tools they use may be more capable of dealing with some formats than with others, and a separate workflow or collaboration process may need to be defined depending on the type of file. This is the problem that XLIFF, an XML format for translations, attempts to address, by providing a set of elements for tracking information common to translation activity. I don’t go into detail on XLIFF here (and in fact have quite a bit more to learn about it), but Trados and other major translation software vendors already include support for XLIFF, and given its emergence as a standard, any software meant to support translation collaboration should also support XLIFF.

Translation Memory (TM)

Finally, both activities–the translation of software interfaces and the translation of documents–benefit from the use of a Translation Memory (TM) system. POEdit can generate a TM for you by letting you import previously existing PO files based on the language(s) you want to work with. OmegaT generates one for you based on the translations you provide while working on your project, but also allows you to import existing TMs based on the TMX standard.

Pending further investigation, I’m considering complementing Plone Translation Hub with a repository of TMX files for enhanced collaboration. Because it’s Zope/Plone, Tumatxa might provide a good starting point.

Note that the effectiveness of TM relies on a program’s ability to "segment" a source document properly, SRX is the emerging standard for defining segmentation rules.

Leave a Reply