2.2 Writing Localizable Code

You should observe the guidelines in the following sections to write code that can be easily localized:

2.2.1 Eliminating Code Problems

You must ensure that the messages and text strings in your program do not have characteristics that hinder the localization process.

  • Remove problem characters. Eliminate the following problem characters from the messages and text strings in your code:

    Character Type

    Problem

    Control characters

    Bell, carriage return, and line-feed are permissible. However, other control characters might display differently to the translator. Even if these characters display correctly, the translator might not understand how to process them.

    Line draw characters

    Line draw characters might appear as different characters in another code page. Do not hard code them into your messages and text strings.

    Characters above 0x07E

    Characters in this range vary across code pages and might appear differently to the translator. Even if these characters display correctly, the translator might not understand how to process them.

    Format specifiers

    The message tools convert format specifiers (such as "%d" and "%s") to orderable tokens (such as <x id="1" ts="%s" /> and <x id="2" ts="%d" />) for the translator. The reorderable tokens can then be used to specify the order of specific parts of speech in a translated message string.

  • Use clear wording. Use clear and consistent wording, as shown in the following table:

    Issue

    Example (wrong/right)

    Resulting Translation Problem

    Solution

    Culture-specific phrases

    John Doe

    generic name

    No translations for such phrases

    Use other descriptive words

    Noun clusters

    Volume mount problem list overflow

    The list of problems in mounting a volume exceeded the maximum space allowed.

    Several possible interpretations

    Add prepositions and articles to clarify intent

    Abbreviations

    int

    integer

    Is it "integer" or "interrupt"?

    Spell out words whenever possible

    Acronyms

    NLS

    Novell Licensing Services

    Several possible interpretations

    Make a note to the translator or spell out

    Technical jargon

    skulk

    In UNIX, a daemon that runs periodically to clean up files.

    Specialized meanings might not be clear to the translator

    Make a note to the translator

    Inconsistent terminology

    Not enough memory or Insufficient RAM

    (use only one consistently)

    The translator might not know whether the terms are used synonymously or to make a distinction

    Use only one word or phrase for a given concept

  • Remove fragmented text strings. Avoid using fragmented text strings, and do not construct text strings from fragments.

    Text strings constructed at run time from smaller string fragments might cause localization problems, as shown in the following example:

       printf("Invalid character found in file %s, ", fileName);
       printf("line %d.\n", lineNum);
       

    In the preceding example, the translator might see each printf as a separate string and might not know that the strings go together. Even if the translator understands that the fragments go together, they might not be able to translate the message correctly because of differences in word order from one language to another.

    To allow the translator to reorder words and insertion parameters as needed, make sure you display each message in its entirety, using a single call. For example:

       NWprintf("Invalid character found in file %s, line %d.\n", 
        fileName, lineNum);
       
  • Avoid language-specific programming techniques. Use programming techniques that are not specific to one language.

    For example,

       printf("%d user%c", count, count != 1 ? ’s’ : ’’);
       

    depends on making a word plural by adding an "s." Many languages form plurals in other ways. Instead, call printf separately to display each variation of the text string, as shown in the following example:

       if (count == 1) 
           printf("%d user", count); 
       else 
           printf("%d users", count);
       
  • Call NetWare internationalization functions. To print a string that has multiple insertion parameters, call one of the following NetWare functions (which are defined in nwlocale.h):

    The standard C printf functions do not support parameter reordering.

  • Minimize hardware dependencies. To minimize hardware dependencies, call NetWare functions whenever possible. You can then indirectly access the hardware through a generic software interface.

    For example, rather than directly updating video memory, call a NetWare function to perform the same action. When you access hardware in this manner, your code is portable across all architectures that are supported by NetWare

    If direct hardware access is required, move that part of the code into a separate NLM application.

    NOTE:Server machine configurations in some locales might be slower and have less memory than other locales.

2.2.2 Handling String Expansions

Every time your program retrieves and displays text, you must ensure there is enough room for the string (both in the memory buffer and on the screen).

  1. Determine the overall size, location, and layout for each instance where text is written to the screen.

  2. Allocate the needed room at run time, based on the localized string length, or build in enough extra room from the start.

    Design your applications to the expansion factors shown in Step 3, regardless of whether the factors are enforced by the tools you are using.

  3. To estimate the amount of room that a given string will need when it is localized into various languages, multiply its English length by the following expansion factor:

    Characters in English

    Expansion Factor

    1-5

    3.1

    6-25

    2.2

    26-40

    1.9

    41-70

    1.7

    75+

    1.5

    The following are some example strings that show the number of characters needed in English and the number of characters that should be allowed for expansion into other languages:

    English String

    Number of Characters in English

    Expansion Factor Used

    Number of Characters to Allow for Translation (rounded)

    print

    5

    3.1

    16

    enter your user ID and password

    31

    1.9

    31

2.2.3 Handling Double-Byte Characters

Your program must be able to dynamically adapt text formatting operations to the specific requirements of the user’s locale (language and culture). Such operations include sorting, upper casing, lower casing, and formatting dates, times, numbers, and currency values for display. These type of values are often stored in double-byte characters.

Double-byte characters should never be targeted for case conversion, even if the characters represent English letters. Most case-conversion routines do not check whether a given byte is part of a double-byte character.

To ensure that your program effectively handles double-byte characters, use the following check list as a guide.

  • Ignore double-byte characters. Ensure that your case-conversion routines skip over double-byte characters.

    Chinese, Japanese, and Korean text contains a mixture of single-byte and double-byte characters.

    Note that NetWare server names, volume names, Y/N responses, and drive letters must be single-byte characters.

  • Call double-byte aware functions. Call functions that are sensitive to the presence of double-byte characters for character-level string operations such as parsing, searching, comparing, wrapping, and truncating.

    Such functions query the NetWare operating system to determine whether a double-byte character set is being used. These functions also determine the range of character codes that is reserved for the leading byte of double-byte characters. The functions then use that information to detect double-byte characters in strings.

    The NetWare internationalization functions that are defined in nwlocale.h are double-byte aware. The following example calls NWLstrcspn (in Internationalization Functions of NLM and NetWare Libraries for C) to strip carriage return and line feed characters from a string:

       LONG DisplayStringCopy( /* routine to strip CR LF from string */ 
                 void *sourceAddress, 
                 void *destinationAddress, 
                 LONG numberOfBytes) 
       { 
         LONG indexCR, minIndex, len, offset, offset1, i; 
         char copy[255]; 
         char charset[3] = { 13, 10, ’\0’ }; 
        
         CMovB( (BYTE *)sourceAddress, copy, numberOfBytes ); 
         copy[ numberOfBytes ] = 0; 
        
         /* skip over all CR and LFs */ 
         len = numberOfBytes; 
         offset = 0; 
         offset1 = 0; 
         i = 0; 
        
         while ( i <  numberOfBytes ) { 
           
           /* find first occurrence of CR or LF */ 
           indexCR = NWLstrcspn( (char *)copy+offset, 
                               charset ); 
        
           if ( indexCR == (numberOfBytes - offset) ) { 
             CMovB( (char *)sourceAddress+offset, 
                    (char *)destinationAddress + offset1, 
                    numberOfBytes - offset ); 
             i += numberOfBytes - offset; 
              offset += numberOfBytes - offset; 
             offset1 += numberOfBytes - offset; 
           } 
           else { 
             minIndex = indexCR; 
        
             /* copy everything up to CR or LF */ 
        
             CMovB( (char *)copy+offset, 
                    (char *)destinationAddress + offset1, 
                    minIndex ); 
             offset1 += minIndex; 
        
             /* skip over CR or LF */ 
             if ( NWCharType( copy[indexCR] ) == NWDOUBLE_BYTE ) { 
               minIndex+=2; 
               len -= 2; 
             } 
             else { 
               minIndex++; 
               len-; 
             } 
             i += minIndex; 
             offset  += minIndex; 
           } 
         } 
         return( len );  
       }
       

2.2.4 Using Double-Byte Algorithms

You should use double-byte aware algorithms when you need to perform character-level string operations manually (without calling a library function).

When you are manually processing a string that might contain double-byte characters, follow these steps:

  1. Check whether the first byte of the string falls within the lead-byte range for double-byte characters.

    Any other byte in the string could be the trailing byte of a double-byte character.

  2. Advance one or two bytes (depending on the result) to the next character.

    In this manner, you can safely proceed from character to character until you find the target character.

    IMPORTANT:To simplify the process of isolating double-byte defects, call the NetWare internationalization functions defined in nwlocale.h, which is included in NLM and NetWare Libraries for C, rather than calling your own double-byte aware functions.

2.2.5 Converting To/From Unicode

If your program interfaces with Novell eDirectory, you must send and receive text in Unicode format.

For functions that you can call to perform conversions between code pages and Unicode, see the NDK: Unicode book in NLM and NetWare Libraries for C.