Photon is designed to handle international characters. Following the Unicode Standard (ISO/IEC 10646), Photon provides developers with the ability to create applications that can easily support the world's major languages and scripts.
Unicode is modeled on the ASCII character set, but uses a 16-bit encoding to support full multilingual text. There's no need for escape sequences or control codes when specifying any character in any language. Note that Unicode encoding conveniently treats all characters - whether alphabetic, ideographs, or symbols - in exactly the same way.
In designing the keyboard driver and the character handling mechanisms, we referred to the X11 keyboard extensions and ISO standards 9995 and 10646-1.
This appendix includes the following:
ANSI C includes the following concepts:
The following ANSI C functions (described in the Watcom C Library Reference) are used for converting between wide-character encoding and multibyte encoding:
In addition, the Photon library provides the following non-ANSI functions (described in the Photon Library Reference) for working with multibyte characters:
In our C libraries, "wide characters" are assumed to be Unicode, and "multibyte" is a synonym for UTF-8. The wchar_t type is defined as unsigned short, and wctomb() and mbtowc() implement the UTF-8 encoding.
Photon libraries use multibyte-character strings: any function that handles strings should be able to handle a valid UTF-8 string, and functions that return a string can return a multibyte-character string. This also applies to widget resources. The graphics drivers and font server assume that all strings use UTF-8.
Unicode is a 16-bit encoding scheme defined in the ISO/IEC 10646 standard:
Glyphs | Range |
---|---|
Nondisplayable keys | 0xF000 - 0xF0FF |
Cursor font | 0xE900 - 0xE9FF |
For Unicode character values, see /usr/include/photon/PkKeyDef.h. For more information about Unicode, see the Unicode Consortium's website at www.unicode.org.
Formerly known as UTF-2, the UTF-8 (for "8-bit form") transformation format is designed to address the use of Unicode character data in 8-bit UNIX environments. Each 16-bit Unicode value is encoded as a one-, two-, or three-byte UTF-8 sequence.
Here are some of the main features of UTF-8:
isInitialByte = ((byte & 0xC0) != 0x80);
The actual encoding is this:
The following table shows the binary form of each byte of the encoding and the minimum and maximum values for the characters represented by 1-, 2-, and 3-byte encodings:
Length | First byte | Following bytes | Min. value | Max. value |
---|---|---|---|---|
Single byte | 0XXXXXXX | N/A | 0x0000 | 0x007F |
Two bytes | 110XXXXX | 10XXXXXX | 0x0080 | 0x07FF |
Three bytes | 1110XXXX | 10XXXXXX | 0x0800 | 0xFFFF |
If your application needs to work with other character encodings, you'll need to convert to and from UTF-8. Character sets are defined in the file /usr/photon/translations/charsets, and include:
The following translation functions are provided, and are described in the Photon Library Reference:
These functions are supplied only in static form in the Photon library phexlib3r.lib. The prototypes are in <photon/PxProto.h>. |
The keyboard driver is table-driven; it handles any keyboard with 127 or fewer physical keys.
A keypress is stored in a structure of type PhKeyEvent_t (described in the Photon Library Reference).
The text widgets use the key_sym field for displayable characters. These widgets also check it to detect cursor movement. For example, if the content of the field is Pk_Left, the cursor is moved left. The key_sym is Pk_Left for both the left cursor key and the numeric keypad left cursor key (assuming NumLock is off).
QNX supports "dead" keys and "compose" key sequences to generate key_syms that aren't on the keyboard. The key_sym field is valid only on a key press - not on a key release - to ensure that you get only one symbol, not two.
For example, if the keyboard has a dead accent key (for example, `) and the user presses it followed by e, the key_sym is an "e" with a grave accent (è). If the e key isn't released, and then another group of keys (or more compose or dead key sequences) are pressed, the key_syms would have to be stacked for the final releases.
If an invalid key is pressed during a compose sequence, the keyboard drivers generate key_syms for all the intermediate keys, but not an actual press or release.
For a list of compose sequences, see the International Character Support chapter of the Photon User's Guide.