The Unicode converter supports the following features:
The newer Unicode contains two sets of functions- "standard" and "extended" functions.
"Standard" functions, which begin with NWUS*, provide a simple interface using the platform’s native country and code page. They follow standard conversion behavior options, such as how to handle an unmappable character. Standard functions follow a default conversion behavior that is not subject to adjustment by the developer.
"Extended" functions, which begin with NWUX*, allow the country code and code page to be specified, and allow extensive control over conversion options.
These functions allow for conversions between byte and Unicode strings and between different kinds of Unicode strings. These standard functions use only the converters supplied by Novell, and do not allow for other than default behavior.
The following functions offer more flexibility and choices that the standard Unicode functions offer.
Unicode Converter is based on a set of converters implemented as DLLs. These converter files may be placed in any directory where the system searches for DLLs (for example, C:\WINDOWS\SYSTEM).
Converter files follow a naming convention that designates both the converter type and the supported platform. The format of that convention is UNI_[TYP].[PLT], where [TYP] is the converter type and [PLT] is the supported platform. Extensions to designate supported platforms are as follows:
There are 4 types of converters. All illustrations that follow use ".W32," the extension for Windows 95 and NT.
For example, UNI_1252.W32 is the converter for code page 1252 (W95/NT).
For example, UNI_C1.W32 is the collation converter for country code 1 (US).
When an "extended" converter is opened, a handle is returned which is used in subsequent calls to extended converter functions. The developer may change various options for a particular converter without affecting other extended converters.
In contrast, once the standard converter is opened, it may be used by any number of programs. The developer cannot change preset standard converter options.
For Unicode-to-byte and byte-to-Unicode conversion, the following behavior is automatic for standard functions and is default for extended functions. Standard functions provide for this behavior only, but extended functions allow extensive modification.
Unmappable Unicode characters result in a call to a function handler, which forms the basis of lossless round trip conversion. The handler converts each unmappable Unicode character into a string of six byte characters as follows:
For example, if the character "#" is an unmappable Unicode "skull and crossbones" character (U+2620),
Scan/parse functions are disabled.
Unmappable byte characters result in a substitution by the standard Unicode REPLACEMENT CHARACTER-0xFFFD.
The scan/parse functions are enabled, reversing the Unicode-to-byte function handler behavior. These scan/parse functions scan for the byte sequence "[NNNN]", where NNNN is a string of four hexadecimal digits. Scan/parse convert each such sequence to a single Unicode character whose value is U+NNNN. For example, if the character "#" were again the Unicode "skull and crossbones" character (U+2620),
The standard converter allows only for Default Conversion Behavior, and the byte/Unicode converters uses that behavior as a starting default. However, the extended functions allow you the following choices:
You can set options other than the system defaults by calling NWUXSetByteFunctions, NWUXSetUniFunctions, and NWUXSetNoMapAction.
For more information, see:
Many Unicode characters cannot be represented in a given local code page. However, situations arise when a Unicode string is converted to a local byte string, then converted back to Unicode. With the former Unicode API, any unmappable characters were lost in this process. The unicode Converter API functions provide the capability to convert from Unicode to local and back to Unicode without losing any information.
Section 1.4, Supported Code Pages shows the code pages supported by Novell.
Although a standard Unicode converter behaves in fundamentally the same way on every platform this API supports, access to converters by any specific application can vary.
NetWare-global variables are global to the entire system. A standard converter initialized with NWUSStandardUnicodeInit or NWUSStandardUnicodeOverride is the only standard converter available to any application on the platform. If the converter is changed through a call to NWUSStandardUnicodeOverride, the change also affects any application requesting standard conversions.
Windows 95, Windows 98, and Windows NT -global variables are global only to a single process. Each process in which a thread calls NWUSStandardUnicodeInit or NWUSStandardUnicodeOverride gets its own copy of the global variables associated with the standard converter. It is therefore possible for one process to perform Unicode conversions with the system default codepage converter and for another process to perform conversions with an explicitly specified converter. Each process that calls a standard converter must also release the converter and its associated resources with a corresponding call to NWUSStandardUnicodeRelease.
For related information, see Unicode Converter Implementation.
Converters in the Unicode Converter API set are installed during the installation process.
NWUSStandardUnicodeInit automatically loads a byte/Unicode converter for the native system code page, an uppercase converter, and a lower case converter.
One of the NWUXLoad... functions must be called to load an extended converter. The load functions return a separate handle to each converter loaded. That handle must be passed to any other extended converter functions involving the respective converter. Each NWUXLoad... function called should be followed with a corresponding call to NUWXUnloadConverter when the converter is no longer needed.
For related information, see:
NWUSStandardUnicodeInit must be called before using any of the standard converter functions. Each call to NWUSStandardUnicodeInit should have a corresponding call to NWUSStandardUnicodeRelease when the conversion operations are complete.
NWUSStandardUnicodeInit automatically loads converters for the following kinds of conversions:
Other kinds of conversions require one or more extended converters and the functions in the extended ( NWUX...) set.
Each of the extended converters is called with a separate NWUXLoad... function, and each such call returns a handle that is specific to the converter loaded. That handle is then passed to any other extended functions that require the respective converter. When an extended converter is no longer needed, it should be unloaded with a call to NUWXUnloadConverter.
For related information, see:
Four functions in this Unicode API conversion set provide for unterminated byte string output from Unicode input- NWUSUnicodeToUntermByte, NWUSUnicodeToUntermBytePath, NWUXUnicodeToUntermByte, and NWUXUnicodeToUntermBytePath.
These functions are identical to NWUSUnicodeToByte, NWUSUnicodeToBytePath, NWUXUnicodeToByte, and NWUXUnicodeToBytePath with one exception-the output byte string is unterminated. A trailing zero is not appended to the converted byte string.
In all other details-kinds of values that can be passed, operations performed, and numeric values returned-the above two sets of functions are identical.
For related information, see:
If the NWU_CONVERTER_NOT_FOUND error is returned when a standard converter is being initialized or an extended converter is being loaded, the converter DLL was not found in any of the expected locations. The system looks for the converter DLL (or NLM) in the usual system DLL/NLM search path. Where the system searches depends upon the operating system under which the application operates.
For example, in Windows 32 applications, the search order is
For NLM applications, the search order is
Differences between Novell and Microsoft Unicode translations tables in different languages have sometimes caused Unicode path strings to be stored with different path separators. Novell Unicode path conversion API functions called from any language recognize these differences and correctly convert any Unicode path separator back to the local path separator character.
Extended byte/Unicode converters can convert either from Unicode to bytes or from bytes to Unicode, depending on the function called after the converter is loaded. Variable converter options include the code page and the country code, specified in the parameters of NWUXLoadByteUnicodeConverter.
With the extended Unicode API functions, you can select any of three actions when an unmappable character is encountered:
For example, if a Unicode string is being converted to local code page in order to be displayed, a user-defined handler function could convert an unmappable character into a red blinking question mark. The default handler inserts the hex value of the unmappable character enclosed in square brackets in place of the character, as explained in Default Conversion Behavior.
Previously, an open code page had to be closed before a new code page could be opened. Using the new extended API functions, you can have multiple byte/Unicode converters loaded and active simultaneously, each with a different code page. For each converter, a handle is returned when the load function completes.
It is important to unload each converter when it is no longer needed, as explained in Unloading Converters. This helps avoid the possibility of tying up system resources needlessly.
This option defines what action to take when an unmappable Unicode character or (less likely) an unmappable byte is encountered during conversion. The options are to
These options are set for individually for Unicode-to-byte conversion and byte-to-Unicode conversion. NWUXGetNoMapAction and NWUXSetNoMapAction specify both options. Refer to the function reference for details.
It is important to note that the default for Unicode-to-byte conversion is to call the handler function. That handler is described in Default Conversion Behavior.
If the NoMapAction is set to NWU_SUBSTITUTE, a substitute byte or Unicode character is output when an unmappable character is encountered. By default, NWU_SUBSTITUTE is set for Unicode-to-byte conversion and not set for byte-to-Unicode conversion.
The default substitution byte or Unicode character is determined by the converter, since different countries often have different preferences on what to display for undefined characters. For byte-to-Unicode conversion, the substitution character is U+FFFD, designated as the Unicode REPLACEMENT character. For Unicode-to-byte conversions, the converters generally set the default substitution byte to 0x03.
You can find out what the substitution characters is through NWUXGetSubByte or NWUXGetSubUni. You can set a new substituting character through NWUXSetSubByte or NWUXSetSubUni.
Fore related information, see:
Two scan action options are defined, one for converting Unicode-to-byte, and one for converting byte-to-Unicode. In the extended API, options are enabled or disabled through NWUXSetScanAction. By default, the scanAction is disabled for Unicode-to-byte and enabled byte-to-Unicode.
Enabling the option causes an automatic prescan of the input string for any special sequences and calls a parse function to replace such sequences with something else in the output string.
When the scanAction option is enabled, a Scan function is called internally to scan the input string before the conversion. If it finds a special sequence, the conversion is performed up to that point and then the Parse function is called internally. The functions are never called directly by the developer. Rather, they are set as explained in NoMap, Scan, and Parse Functions.
The system supplies default scan and parse functions for both byte-to-Unicode and Unicode-to-byte conversions. The byte-to-Unicode scan/parse functions operate as described in Default Conversion Behavior—where # is the Unicode "skull and crossbones character" (U+2620), the byte input string "abc[2620]def" becomes the Unicode output string "abc#def".
By default, scan/parse action is disabled for Unicode-to-byte conversion because the need for such action is very rare. If it is enabled, it operates in a similar way to byte-to-Unicode conversion. It scans for two hexadecimal digits surrounded by square brackets in a Unicode input string and converts them into a byte character of the same hexadecimal value in the byte output string.
For related information, seeSetting Scan/Parse Functions with an Extended Converter
NWUXSetByteFunctions and NWUXSetUniFunctions set the NoMap, Scan, and Parse functions for the extended converter. The NoMap function is enabled if the NoMapAction is set to NWU_CALL_HANDLER, and the Scan and Parse functions are enabled if the ScanAction option is set to NWU_ENABLED.
The default behavior (the only behavior available for the standard converter) is to use the system supplied UniNoMap function and the ByteScan/Parse functions as described in Default Conversion Behavior. These functions implement round-trip conversion from Unicode to byte to Unicode.The developer may replace any of these functions with custom versions.
For related information, see Setting Scan/Parse Functions with an Extended Converter
Unicode provides functions for converting a specified number of bytes from a byte string into Unicode characters.
For the standard converter, the functions are NWUSLenByteToUnicode and NWUSLenByteToUnicodePath.
For extended converters, the functions are NWUXLenByteToUnicode and NWUXLenByteToUnicodePath.
These functions behave exactly like their "Len" counterparts: NWUSByteToUnicode, NWUSByteToUnicodePath, NWUXByteToUnicode, and NWUXByteToUnicodePath, with the following exceptions:
If the length-specified function encounters a NULL before the specified number of bytes have been converted, it stops converting and returns NWU_EMBEDDED_NULL. However, it converts the bytes prior to the NULL, and returns the number of Unicode characters converted.
For example, consider the byte string abcdefgh for the following example:
ccode = NWUSLenByteToUnicode (&outbuf, MAX_LEN, inbuf, 5, &outlen);
On return ccode is zero, outbuf contains the Unicode string abcde, and outlen contains 5.
In contrast, given the byte string abc\0defg, the NWUSLenByteToUnicode function returns NWU_EMBEDDED_NULL. On return outbuf contains the Unicode string abc and outlen contains 3.
For related information, see
Case converter options are set with the caseFlag parameter of include the following possibilities:
For related information, see: Converting Unicode String Case with an Extended Converter
Previous versions of the Novell Unicode API converted code page strings only into lower case Unicode strings. This limitation is now removed so that Unicode strings are no longer limited to lower case. Unicode strings now be converted to upper case or lower case with standard functions, and upper, lower, or title case (first letter of each word is capitalized) with the extended functions.
For related information, see