Novell Documentation: NetWare 6

Code Pages

A code page is a table storing a character set that supports one or more language scripts. When you press any key on a keyboard, the computer receives a numeric code that represents that keystroke. Code pages store these numeric codes. Many personal computer operating systems support multiple code pages and allow you to switch among them.

For example, DOS uses code page 437 for several languages that use Roman alphabetical characters, including English, French, and German, but requires code page 850 for Portuguese. DOS code page 850 (Portuguese) removes the symbol for f (franc) and inserts an O (acute). Different computer operating systems use different code pages for the same language. For example, DOS uses code page 437 for English, but Windows* 95 uses code page 1252.

In a single-byte code page, up to 256 codes are available to represent lower- and uppercase letters, numbers, punctuation marks, and all the mathematical symbols on your keyboard.

However, 256 codes are not sufficient to represent all the letters and characters used in the writing systems of every language. Some nonalphabetic writing systems, such as Chinese, Japanese, and Korean, contain thousands of characters and require a double-byte code page.

Differences between single-byte and double-byte code pages usually cause display and readability problems. For example, a document created with Windows 95 in Japan is probably created with code page 932. The same document will not look the same when displayed on a Windows 95 computer using code page 1252 in the United States. Unrecognized characters will be replaced with a symbol such as a heart. In the past, these substituted characters might have caused a database such as Novell eDirectory to fail to recognize objects.

To help resolve these problems, a convention called Unicode* has been adopted.

Using Unicode

Unicode is a 16-bit character representation, defined by the Unicode Consortium, that supports up to 65,536 unique characters. Unicode allows the characters for multiple languages to be represented using a single Unicode representation.

Any character that your code page does not understand is substituted in your display by the 4-digit hexadecimal value of the Unicode character, surrounded by square brackets, for example: [00FF]

Because eDirectory supports Unicode, substituted characters do not prevent eDirectory from recognizing an object. For example, your company's European office might create an Organizational Unit object to represent Finance in western Europe. They might use DOS code page 852 to make the generic currency symbol a part of the object name (OU=[curren]W-Euro).

When this object is accessed in the United States, using DOS code page 437 or Windows 95 code page 1252, the currency symbol ([curren]) is replaced by square brackets surrounding the Unicode number for the currency symbol, [00A4]. eDirectory recognizes the Unicode number, so the object can still be opened and accessed.

However, the object name (containing the square brackets and unicode number) will be difficult for users to understand. If the name is too difficult to interpret, the only solution is to determine which code page was used to create the object and then view the object using that code page. Changing code pages can be troublesome; see Changing Code Pages for guidelines.

The following table shows ranges of Unicode numbers, with a description of each range and a list of code pages that might be used to view the character correctly. However, switching to one of the suggested code pages does not guarantee that you will see the correct results. For example, characters in the range 4E00-9FFF (Han Ideographs) are used in Japan, China, and Korea. But switching to code page 932 (Japanese) does not display the character correctly if the character is used only in China.

The most reliable way to determine the character is to refer to the Unicode Standard, Version 2.0. Access the Unicode Web site for more information. The Web site also includes charts of Unicode characters.

Table 1. Unicode Ranges, Descriptions, and Code Pages

Unicode Range	Description	Geographical Region	Windows Code Pages	DOS Code Pages
0080 - 00FF	Extended Latin	Western Europe		437, 850,860, 863, 865
0100 - 01FF	Extended Latin	Central Europe	1250, 1257	852, 775
0300 - 03FF	Greek	Greece	1253	737
0400 - 04FF	Cyrillic	Russia	1251	855, 866
0590 - 05FF	Hebrew	Israel	1255	862
0600 - 06FF	Arabic	Middle East	1256	864
2500 - 26FF	Line Drawing and Graphics	N/A	N/A	Most DOS code pages
4E00 - 95FF	Han Ideographs	Far East	932, 936, 949, 950	932, 936, 949, 950
AC00 - D7FF	Hangul Syllables	Korea	949	949
FE70 - FEFF	Arabic Presentation Forms	Middle East	N/A	864
FF00 - FFEF	Full- and Half-Width Variants	Far East	932, 936, 949, 950	932, 936, 949, 950