Java

Internationalization
Frequently Asked Questions

This page answers common questions about internationalization of the Java 2 platform, Standard Edition, version 1.4.2, and of Sun's Java 2 Runtime Environments, Standard Edition, version 1.4.2. For more information, see the Internationalization home page.


General Questions

What is internationalization?

Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.

What is localization?

Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

How do I go about internationalizing an existing program?

See the steps outlined in the Checklist section of the The Java Tutorial.


Locales

What is a locale?

A locale is a geographic or political region that shares the same language and customs. In the Java programming language, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to locale.

Where can I find some coding examples that use Locale objects?

See the Setting the Locale section of the The Java Tutorial.

Which locales are supported?

The supported locales vary between different implementations of the Java 2 platform and between areas of functionality. Information about the supported locales in Sun's Java 2 Runtime Environments is provided by the Supported Locales document.

Can a Java application use multiple locales?

Yes. This capability allows you to create multilingual applications.

Can I set the default locale from outside an application?

This depends on the implementation of the Java 2 platform you're using. The initial default locale is normally determined from the host operating system's locale. Version 1.4.2 of Sun's Java 2 Runtime Environments lets you override this by setting the user.language, user.country, and user.variant system properties from the command line. For example, to select Locale("th", "TH", "TH") as the initial default locale, you would use:

java -Duser.language=th -Duser.country=TH -Duser.variant=TH MainClass

Since not all runtime environments provide this feature, it should only be used for testing.


Resource Bundles

What is a resource bundle?

A ResourceBundle object allows you to isolate localizable elements from the rest of the application. With all resources separated into a bundle, the application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Where can I find some coding examples that use ResourceBundle objects?

See the Isolating Locale-Specific Data section of the The Java Tutorial.

How do I specify non-ASCII strings in a properties file?

You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:

s1=hello there
s2=\uff2d\uff33\u30b4

If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift-JIS, a popular Japanese encoding.

How do I compile a non-ASCII ListResourceBundle?

If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift-JIS encoding as follows:

javac -encoding SJIS LabelsResource_ja.java


Text Processing

How do I format a date?

You can use the SimpleDateFormat to format and parse dates in a locale-sensitive manner. See the section on formatting Dates and Times in the The Java Tutorial.

Are formatters thread-safe?

Instances of java.text.Format and its subclasses are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.

How does setting the default locale affect the results of sorting?

The Collator class, and its subclasses, are used for building sorting routines. These classes are locale-sensitive, and when created with the no-argument constructor will use the collating sequence of the default locale.

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Since decomposing takes time, turning decomposition off makes comparisons go faster. However, for Latin languages the NO_DECOMPOSITION mode is not useful if the text contains accents. You should use the default decomposition unless you really know what you're doing.

The strength property you choose depends on what your application is trying to accomplish. For example, when performing a text search you may allow a "weak" match, in which accents and differences in case (upper vs. lower) are ignored. This type of search employs the PRIMARY strength. If you are sorting a list of words, you might want to use the TERTIARY strength. In this mode the properties that must match are the base character, accent, and case.


Character Encodings

What is a character encoding?

A character encoding is a mapping between characters and code values.

What is Unicode?

In the Java programming language, char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium web site.

How do I convert data between Unicode and other character encodings?

The Converting Non-Unicode Text section of the The Java Tutorial explains how to perform the conversions within an application using high-level APIs, or see the java.nio.charset.Charset class if you need more direct access to character conversion. To convert data files, use the native2ascii tool.

Which character encodings are supported when converting text to and from Unicode?

See the Supported Encodings web page.

How do I create my own character converters?

The java.nio.charset.spi.CharsetProvider class lets developers create their own character converters.

What is the default encoding?

The default encoding is selected by the Java runtime based on the host operating system and its locale. For example, in the US locale on Windows, Cp1252 is used. In the Simplified Chinese locale on Solaris, either EUC_CN or GBK can be the default encoding, depending on the selection made when logging into Solaris.

The default encoding is significant because the Java programming language uses Unicode to represent characters, but the file system of the host operating system usually uses some other encoding. The default encoding has to match the encoding used by the host operating system to ensure correct interaction.

Why can't I use all European characters on Solaris?

There are many character encodings that don't support all European characters (such as "ß" or "é"), but we get this question particularly often from users of the Solaris C locale. On Solaris and Linux, the Java 2 Runtime Environment version 1.2 and higher determines the default encoding by calling the nl_langinfo function. On Solaris 7 and higher, this function returns "646" when run in the C locale, indicating ISO 646 or ASCII as the default encoding. ASCII only includes half the characters of ISO 8859-1, so many commonly used European characters are missing.

An easy workaround is to use the Solaris en_US locale, which uses ISO 8859-1 as its character encoding. You can set the Solaris locale from the login screen or by setting the the LC_ALL environment variable. Another solution is to explicitly specify the desired character encoding in your calls to String, java.io, and java.nio API that performs encoding conversion.

What is the UTF-8 encoding?

UTF-8 stands for Unicode (or UCS) Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.

Are the Cp1252 and ISO8859_1 encodings identical?

No. Cp1252 contains some additional characters in the range from 0x80 to 0x9F. See the Microsoft documentation for more information.


Text Input

What is the Input Method Framework?

The input method framework enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.

What does it mean to switch input methods?

A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a particular language or the input method that provides the fastest input.

Can an input method be selected and activated programmatically?

An application can request an input method that supports a specific locale using the InputContext.selectInputMethod method, but it cannot select a specific input method - that selection is up to the user.

An application can activate an input method using the InputContext.setCompositionEnabled method.

Do the AWT and Swing (JFC) text components work with input methods?

See the Input Methods section of the Java 2 SDK Internationalization Overview.


Text Rendering

What choices does an application have in selecting fonts?

An application using lightweight components can select fonts in four different ways:

An application using peered AWT components can only use logical font names.

What are the advantages and disadvantages of these four approaches?

Here's a brief summary:

Why doesn't my application display any Chinese, Japanese, or Korean characters even though I have fonts for these languages installed?

The answer depends on how your application selects fonts - see above.

What is a font.properties file?

The font.properties files are used in Sun's Java 2 Runtime Environments to map logical font names to physical fonts. There are several files to support different mappings depending on host operating system version and locale. The files are located in the lib directory within the J2RE installation.

Note that font.properties files are implementation dependent. Not all implementations of the Java 2 platform use them, and the format and content vary between different runtime environments as well as between releases.

How do I add a physical font to the mapping of a logical font?

Since the mapping from logical fonts to physical fonts is implementation dependent, the answer varies. For Sun's Java 2 Runtime Environments, you need to create or modify a font.properties file - see the web page The font.properties Files. Note however that this is a modification of the J2RE, and Sun does not support modified J2REs. For other implementations, see their respective documentation.

Why can I see some characters in Swing components, but not in peered AWT components?

Swing user interface components use a different mechanism to render text than peered AWT components. The Swing components use the Graphics.drawString method, typically specifying a logical font name. The logical font name is then mapped to a set of physical fonts to cover a large range of characters. AWT components on the other hand are implemented using host operating system components. These host operating system components often do not support Unicode, so the text gets converted to some other character encoding, depending on the host operating system and locale. These encodings often cover a smaller range of characters than the physical fonts used to implement logical font names. For example, on a Japanese Windows 98 system, many European accented characters are mapped to the Arial font for Swing components, but get lost when converting the text to the Shift-JIS encoding for peered AWT components.

Why can't my application display all Unicode characters even though I have a Unicode font installed?

As in the Chinese/Japanese/Korean case above, this may be because text is not rendered using the Unicode font at all or only for some characters. If your application selects the Unicode font using its physical font name, and it still cannot render all characters, it could be that the Unicode font doesn't in fact cover the entire Unicode character set - sometimes a font is called a Unicode font if it just provides the tables that support the Unicode character encoding.

What font types do Sun's Java 2 Runtime Environments support?

See the Supported Fonts document.

Is it possible to display more than one language in Sun's Java 2 Runtime Environments?

The short answer is yes. The long answer needs to look at which languages you want to display at the same time, and how your application selects fonts.

Can Sun's Java 2 Runtime Environment render text in Thai, Lao, Burmese, or any of the Indic scripts?

Among the South and South-East Asian scripts, version 1.4.2 of Sun's Java 2 Runtime Environments supports Thai and Devanagari. For a complete list of all supported writing systems, see the Supported Locales document. Support for other writing systems may be added in future releases.


Component Orientation

Which user interface components implement component orientation in Sun's Java 2 Runtime Environments?

See the Supported Locales document.


Miscellaneous

Do Sun's Java 2 Runtime Environments support the Euro currency?

Yes, Sun's Java 2 Runtime Environments let you type the Euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows and Solaris. For formatting with a currency symbol, Sun's Java 2 Runtime Environments v. 1.4.2 uses the Euro as the default currency for the member countries of the European Monetary Union.


Copyright © 2003 Sun Microsystems, Inc. All Rights Reserved.

Please send comments to: java-intl@java.sun.com

Sun
Java Software