Novell Doc: NDK: Libraries for C (LibC), Volume 1

4.1 String Interface Sets

Depending on the strings your application is manipulating, you can use

Standards-based single-byte character functionality
Pseudo standards-based multibyte character functionality
Standards-based wide-character functionality
Special LibC Unicode functionality

Flag commands set at link time also affect the string types that are sent to your application. If your application is using NKS functions, the following bits affect string type:

If the Unicode bit (0x02000000) is set with FLAG ON, the NKS functions assume Unicode strings.
If the UTF-8 bit (0x00800000) is set with FLAG ON, the NKS functions assume UTF-8 strings.
If neither the Unicode or UTF-8 bit is set, the NKS functions assume ASCIIZ strings, including multibyte.

If your application is using POSIX* and ANSI functions, the following bits affect string type:

If the UTF-8 bit (0x00800000) is set with FLAG ON, the functions assume UTF-8 strings.
If the UTF-8 bit is not set, they assume ASCIIZ strings, including multibyte.
Setting the Unicode bit has no affect. POSIX/ANSI functions do not accept Unicode strings.

For more information on linker commands, see Section 1.6, Using a Linker Definition File.

4.1.1 Single-Byte Character Strings

LibC implements the string.h and ctype.h interfaces from ISO 9899:1990, 9845:1996, and 9989:1999.

If you expect multibyte strings of any sort, many of these functions won't work well and might create trouble depending on the sophistication of the functionality you use.

4.1.2 Multibyte Character Strings

On NetWare, multibyte currently means double-byte because all platforms that presently host NetWare make use of local code pages whose multibyte solution requires only two bytes. For example, on the NetWare server, Japanese characters can be single-byte or double-byte. A double-byte Japanese character is detected when the first byte falls in one of the hexadecimal ranges {81..9F} or {E0..FC}.

The string.h and ctype.h files also contain the interfaces for the direct handling of strings known to be multibyte ASCII. These functions with standard-based names are prefixed with L. They are safe to use with any source of string, single- or multibyte.

If, for example, you wanted a count of the number of characters in a string, you could not use strlen because it returns the number of bytes. A string with one double-byte character has one less character than is reported by measuring that string with strlen. Similarly, character divisions for the purposes of comparing (strcmp), scanning (strtok, strspn, etc.) and other operations might create trouble when done by the usual functions. Instead, you need to use Lstrcmp, Lstrtok_r, Lstrspn, etc. for multibyte character strings.

4.1.3 Wide Character Strings

LibC implements the wchar.h and wctype.h interfaces from ISO 9899:1999. In LibC, wide character means Unicode.

If you are manipulating wide (Unicode) characters, you might want to use the interfaces such as wcscat and wcsspn (similar to strcat and strspn) from wchar.h.
If you are translating between characters in the code page of the local host hardware, you might use mbtowc, wcstombs, etc. from stdlib.h.
If you are classifying wide characters, you should use the isw... family of functions from wctype.h.

4.1.4 Unicode Strings

If you want to translate between Unicode and a code page that is not the one set for the local host hardware, you must use the functions from unilib.h. This header and its interfaces accommodate quick, diverse, and flexible translation among ASCII multibyte, Unicode, and UTF-8 strings. To remain in unicode_t rather than moving to wchar_t and back, there are character manipulation functions similar to those in wchar.h, including unicat, unispn, etc.

4.1.5 UTF-8 Strings

If all your strings are in UTF-8, direct manipulation is a problem just as it is for multibyte ASCII strings in the local code page, because UTF-8 strings are, in essence, multibyte strings suffering from the same character definition problems as multibyte ASCII strings. The functions in utf8.h are a set roughly parallel to those in string.h for normal and multibyte ASCII strings. For example: utf8len, utf8cmp, utf8spn, etc.