Who Works for Localization Pack: XML or Java Resource Bundle? : Introduction

There are various ways to implement localization pack. The two most common approaches are XML and java resource bundle. As a standard method of message separation since the time of JDK1.1, resource bundle is widely used by software written in Java because of its high performance and flexibility. However, more and more applications adopt XML as the source format of localization packs for several reasons; its cross-platform, flexible structure, Unicode support and the fact that it is associated with mature and emerging XML technologies. Customers often ask which one they should choose. It is hard to compare them and give an absolute answer since both of them have their advantages and applicable stages. Sometimes we can combine them for better use according to particular scenarios and application environments. The strength and weakness aspects of influencing the localization pack realization will be addressed, as well as considerations and examples that should be considered when designing the localization pack and feasible implementation derived from the two techniques.
Shu Bei
Globalization Certification Lab
Localization pack
The first step in globalizing your program is message separation. The separation and translation of the messages comprises of a significant percentage of the globalization effort. It‘s a standardized approach for software to support multiple languages and locales through one executable. A localization pack is a software component that supplies all the localizable material needed for the executable to work in a specific language. Localization packs are composed of translatable texts including the strings on GUI panels, application pages, menu items, errors, warnings, informational messages, help files, language-specific audio/video, and region-specific icons/images.
The implementation of localization pack differs on various platforms and programming languages. Generally, localization pack can be realized by resource-only DLLs, database objects, flat files, resource bundles or XML. In the on demand world of Java, the debate about whether to use XML or resource bundle as localization pack has become a popular topic.
Localization pack for XML and Java
Here, we separate three strings in that page, which can be organized to localization pack as in Figure 1, English and Figure 2, Simplified Chinese.
Figure 1: logon_en_US.xml
Figure 2: logon_zh_CN.xml
A program parses the XML files, retrieves the required strings and inserts them to the corresponding locations of UI page. DOM and SAX are the most widely used XML parser interfaces.
DOM (Document Object Model) defines the programming interface for XML and HTML. A DOM parser has the complete input document loaded in memory in a tree structure, where the document is broken down by nodes. DOM offers the advantage of easily accessing any part of the document at any time. You can insert, delete, modify, and rearrange the document in any form.
The main drawback of DOM is that it can be quite memory-consuming if the documents is large and has a complex structure. It is also often slower than a SAX parser.
XML localization pack manager may use DOM based parser techniques such as XPath and XSLT. Retrieving single string from the localization pack XML by XPath or transforming the whole localization pack XML with a XSL file are both feasible and convenient ways to utilize XML localization packs.
SAX (Simple API for XML) is event-driven: The program defines handler functions that are triggered when an element, a character, or any other part of the XML document is read. Its memory trace is usually much smaller than for DOM, but it is also a sequential access. You must keep track of any inherited values passed from parent elements to their children.
A growing number of XML libraries are available for many different programming and scripting languages that offer both interfaces.
Localization packs in Java resource bundle
Java introduced a mechanism of message separation with JDK1.1 and it continues as a standard method of message separation. The class that provides message separation capability is known as java.util.ResourceBundle. ResourceBundle is an abstract class, its subclasses java.util.ListResourceBundle and java.util.PropertyResourceBundle, are put to practical use in creating simpler message files - localization packs. A separate file in its simplest form is an array or enumeration of the combination of key and message strings. The core program loads the messages as necessary at run time using keys to locate the particular message. The program can support as many languages as the translation is doing.
PropertyResourceBundle is used for property files. It‘s a flat, editable text file as shown in Figures 3 and 4. The drawback is it can not support Unicode explicitly and needs to run native2ascii to convert to Unicode escape characters for non-Latin1 languages.
XML localization pack on a "log on" page:
Figure 3: LogonResource.properties
Figure 4: LogonResource_zh_CN.properties
ListResourceBundle accommodates the message to an object Byte code with compilation after edit. Thus, performance is better than property resource bundle.
Figure 5: LogonResource.java
Figure 6: LogonResource_zh_CN.java
ListResourceBundle runs native2ascii and converts to Unicode escape characters for non-Latin1 languages either before or after compilation.
Programs use the method ResourceBundle.getBundle(basename, locale) to load the bundle and use getString(keyname) to get the corresponding string.
Java has a complete mechanism to handle the resource bundle. In addition, the basic bundle locating and message retrieving, java resource bundle provides developers with extended functions such as compound message handling and automatic locale fall back. Resource bundle mechanism is supported in different ways depending on the environment. For example, IBM Web Sphere® Portal enables the use of resource bundle in JSP by portlet taglib.
Comparison of XML and Java
XML and Java resource bundle can both be used for localization pack implementation. There are key factors which can influence decisions to utilize one over the other and should be considered when designing the localization pack.
Platform
The first consideration in determining whether to use XML or Java resource bundle is which platform it is built on. Java resource bundles are only applicable in Java related platform. XML, as an industry standard, is platform independent and can be used with a variety of software and programming languages as long as the parser supported is available. From this standpoint, XML is a better choice due to its ability to work on varied platforms.
Unicode support
Unicode is a significant part of globalization. Both XML and resource bundle support Unicode, but in different ways. XML accommodates Unicode characters explicitly. The default encoding of XML is UTF-8, a convenience for file editing, modification and exchanging.
Java resource bundle runs the command native2ascii to convert the original non Latin-1 language characters to Unicode escape characters before compilation to the final software. The extra step adds a negative impact to the design, development and maintenance of localization pack.
XML is faster for Unicode support, Java resource bundle for localization pack files don‘t need frequent update.
File Structure
Applications have different requirements for the internal organization of localization pack files. The hierarchical structure of XML is flexible enough to accommodate all kinds of structured data defined and validated by DTD and schema. As a self-describable language, XML can embed varied information for translations or as descriptions. ListResourcebundle and PropertyResourceBundle are flat structured and unsuitable for hierarchical data.
Authoring
The generation, editing, modification, organization of localization pack files require editing tools. XML and property files can be regarded as textual files and edited with text editors. XML currently has more flexible and easy-to-use editing tools.
Translation
Translation tools such as IBM TM2 support both XML and Java resource bundle. Due to the simple structure of property files, translation tools can only accommodate basic translation functions. XML‘s self-describable, validation support enables more translation functions from the tools. For example, translation tools can validate the XML documents during importing by using DTD/schemas. To aid the translation tools in formatting, XML documents can contain the following information (or localization properties):
Which elements and attributes are translatable Which elements are inline (ie should be included within the text, like a in XHTML) Which elements have a content that is pre formatted Whether any element needs to be treated with different rules (ie scripts).
Currently, there is no standard way of defining XML localization properties. The creation of a standard would allow XML users to easily define the localization properties of their vocabulary regardless of the localization tool used. Resource bundle doesn‘t have the capability to allow the definition of such standard.
Development
The development of localization pack includes preparation of localization pack files, coding for localization pack managers (code to locate, access, and manage localization packs),and the code for replacing original messages with corresponding localization pack manager code. Programming for localization pack managers should also consider a fall back implementation (scenarios for when the required localization pack files in specified locale are missed or do not exist at all). A more complex replacement may involve a compound message with parameters.
Message separation
Separation and translation occupy a large percent of time for globalization development. Tools for message separation are used to automatically extract translatable messages from the program and replace with the localization pack manager code. This function is available in some Java IDE tools such as Visual age for Java (also called JIT- Java internationalization tool). The tool extracts the messages into property files and replaces with code to read them greatly facilitating the globalization development process.
The standard XML localization pack manager isn‘t defined currently due to XML‘s various file structures and the varied way applications handle XML localization files through their parsers. In Globalization Certification Lab, we write a tool for extracting strings from JSP files to an XML file with unified format inside one project. The tool can be reused if the localization environment is similar (ie: we globalize JSP with the same XML localization pack format).
Compound message handling
Localization packs contain messages that are fixed - the strings are brought from the XML or property files and are displayed as they are. Localization packs may also contain messages that are constructed with parameters at run time or compound messages. Java resource bundle implements the compound message mechanism perfectly. Programmers can call them directly with little extra coding effort.
Automatic locale fall back
It‘s fairly challenging for an application to support all the existed languages around the world. Usually, only the major languages are supported. Applications need to handle the case when a clients requirement for another new language isn‘t supported currently or the supported localization pack files are lost unexpectedly. Java resource bundle contains the mechanism for locale fall back, when the localization packs can‘t be found for the required language, it will fall back to the other language of the parent locale or default locale.
Through the mature Java technique, the use of resource bundle for localization packs saves programmers development effort. There are many XML parsers that manipulate XML very well, but currently there isn‘t a standard XML localization management tool or technique to support XML localization packs. Developers need to write such localization pack managers from scratch or base on some third party XML software such as Xerces, Xalan from Apache projects.
Performance
Performance may be the most important factor we should consider. The parsing and searching procedures on XML document tree are time consuming. A cache mechanism may improve performance to some extent, but it is not significant.
Java resource bundle is well known for high performance. The key word mapping is quite fast in flat structured property files. ListResourceBundle is faster than PropertyResourceBundle as it is compiled and run as binary code.
Run time update
XML localization pack has more flexibility on run time update. The update on the localization pack files can be reflected timely without restarting the application. This allows a new language to be easily plugged at runtime without end user‘s awareness. Java resource bundle loads the property files at the application start time and any update will require restart. For ListResourceBundle, the update requires application rebuild to compile the localization packs into the application.
Implementation
XML and resource bundle each have strong and weak aspects. In designing a localization pack, its best to combine their advantages to achieve the optimum solution. Here are some suggestions:
Suggestion 1 - XML with its parser
Description XML is used as the source format and run time format for localization files. Localization pack manager works as the parser to retrieve individual strings from the localization files.
Advantage Convenient for editing, translation and runtime update
Disadvantage Poor performance
Suggestion 2 - XML and XSLT
Description XML is used as the source format and run time format for localization files. Localization pack manager uses an XSL file to transform the XML to compose the final page. This approach is good for help files.
Advantage Convenient for editing, translation and runtime update for both localization pack XML files and XSL files
Disadvantage Poor performance
Suggestion 3 - Property Resource bundle
Description Property files are used as the localization pack files
Advantage High performance, editable, convenient for development
Disadvantage Conversion for non Latin 1 languages necessary. Restart for runtime update.
Suggestion 4 - List resource bundle
Description ListResourceBundle classes are used as the localization pack files
Advantage High performance, convenient for development
Disadvantage Conversion for non Latin 1 languages necessary. Rebuild for the update.
Suggestion 5 - XML to resource bundle at build time
Description XML is used for the source format for localization pack generation and translation, converted to property files at the application build time.
Advantage Convenient for editing and translation; easy for development; high performance
Disadvantage Conversion tool necessary. Restart for runtime update.
Suggestion 6 - XML to resource bundle at setup time
Description XML is used for the source format for localization pack generation and translation. But they need to be converted to property files or class files during installation.
Advantage Convenient for editing and translation, easy for development, high performance
Disadvantage Conversion tool necessary. Reinstallation for update.
Conclusion
Resource bundle is limited to the Java world and the structure of the message file is flat and simple making exploration of more functionality for future use difficult. Whereas, XML is the standard language for the future of the IT industry. More tools and techniques are being developed to support XML localization. One example is XLIFF (XML Localization Interchange File Format), an exchange format for translatable data defined by a group driven by companies including Oracle, Novell, Sun, and IBM/Lotus. XLIFF is close to Open Tag in many respects, but in a more defined format, with fewer possibilities to express content and offering better inter operability. The format specializes in storing text extracted from software-type files and tagged documents.
In comparison of the localization pack implementation on XML and Java resource bundles, we find the Java resource bundle has high performance and is easy to use. XML provides flexibility with organization, translation, live update and future extensions. Knowing their advantages allows us to use either of them for the best possible solution.
For more information on this topic, please contact us at
global@us.ibm.com.