12.2 Unicode Character Classifications

The unitype function uses the following flags to return information about a specified Unicode character. These flags can be ORed together. They are defined in the unilib.h file.

Constant

Value

Description

UNI_UNDEF

0x00000000

No classification

UNI_CNTRL

0x00000001

Control character

UNI_SPACE

0x00000002

Non-printing space

UNI_PRINT

0x00000004

Visible print character

UNI_SPECIAL

0x00000008

Dingbats, special symbols, etc.

UNI_PUNCT

0x00000010

General punctuation

UNI_DIGIT

0x00000020

Decimal digit

UNI_XDIGIT

0x00000040

Hexadecimal digit

UNI_RESERVED1

0x00000080

Reserved for future use

UNI_LOWER

0x00000100

Lowercase, if applicable

UNI_UPPER

0x00000200

Uppercase, if applicable

UNI_RESERVED2

0x00000400

Reserved for future use

UNI_ALPHA

0x00000800

Non-number, non-punctuation

UNI_LATIN

0x00001000

Latin-based

UNI_GREEK

0x00002000

Greek

UNI_CYRILLIC

0x00004000

Cyrillic

UNI_HEBREW

0x00008000

Hebrew

UNI_ARABIC

0x00010000

Arabic

UNI_CJK

0x00020000

Chinese, Japanese, or Korean characters

UNI_INDIAN

0x00040000

Indian: Devanagari, Bengali, Tamil, etc.

UNI_SEASIA

0x00080000

Southeast Asia: Thai, Lao

UNI_CENASIA

0x00100000

Central Asia: Armenian, Tibetan, Georgian

UNI_OTHER

0x80000000

None of the above