HP-UX 11i v3 Internationalization Features Table of Contents Abstract ........................................................................................................................................ 3 Intended Audience.......................................................................................................................... 3 1. Introduction ............................................................................................................................... 3 2.
iconv Command ....................................................................................................................... 14 Unix 2003-Related Changes.................................................................................................... 14 iconv Codeset Converter Config File Changes—system.config.iconv............................................... 14 eucset Command .....................................................................................................................
Abstract This document summarizes the Internationalization features that are new or updated for HP 11i v3. Intended Audience This document is primarily intended for general users and system administrators who are familiar with the Internationalization features on the HP-UX operating system environment. 1.
2. Standards-Related Updates Unicode 5.0 HP-UX 11i v3 provides system level support for Unicode 5.0. Unicode 5.0 is aligned with the revised ISO 10646-2:2003 standard, including Amendments 1 and 2, defining 99,089 total characters. It includes an additional 48,830 new characters from the previously supported Unicode 3.0 version. Most notable of these additions are new CJK ideographic characters.
Big5-2003 and CNS11643 System-level support is provided for Big5-2003 and CNS11643-2004, two Traditional Chinese character sets. The Big5 standard, established in 1984 and revised in 2003 (Big5-2003), was designed to provide a small basic character set to encode the contemporary Traditional Chinese characters. Big5-2003 was defined as a Big5 extension and has only two planes with 13,051 T-Chinese characters and 778 symbols, totaling 13,829 characters.
Also in these converter tables, the galley character has been changed from 0xFFFF (GB18030 0x8431A439) to Unicode 0xFFFD (GB18030 0x8431A437), as 0xFFFF is not a valid Unicode character. UNIX 2003 UNIX 2003 is an industry product standard that is specified by the Single UNIX® Specification Version 3. Conformance to this standard in the area of internationalization requires changes in some commands, locale binaries, and libc functions. These subjects are discussed later in this document.
3. Locales Unicode 5.0 Support As stated earlier in this document, HP-UX 11i v3 includes Unicode 5.0 support in all supported utf8 locales. All 52 previously supported, system-supplied utf8 locales have been updated to support the character repertoire specified by the Unicode 5.0 standard. In addition, all new 11i v3 utf8 locales (refer to the "New locales—Baltic/Russia/Ukraine/Latin America" section) align with the Unicode 5.0 standard. Note that as of 11i v3, the locale binaries provided are version 3.
Customers who provide their own set of customized locale binaries will need to rebuild them on 11i v3 systems using the localedef (1M) command to generate correct v3 locales.3 binaries. Locale binaries built on previous releases, such as those installed in locales.1 or locales.2 subdirectories, may not be installed into the locales.3 area. This has been documented as not being permitted in the "Note" section of the localedef (1M) manual page for the past several releases.
Latvia lv_LV.iso88594 lv_LV.iso885913 lv_LV.utf8 Lithuania lt_LT.iso88594 lt_LT.iso885913 lt_LT.utf8 Country MS CP1251 Based Koi8-R Based UTF-8 Based Russia ru_RU.cp1251 ru_RU.koi8r Ukraine uk_UA.cp1251 Russia and Ukraine uk_UA.utf8 Note that for the PA-RISC versions, the locale binaries provided are version 3. For more information regarding support levels, refer to the "New Locale Versioning" section in this document.
4. Code Set Conversions Unicode 5.0 additions have been made to iconv converters to support new Unicode 5.0 characters, surrogate characters, byte-order marks, and all forms of Unicode-specified transformations (including UTF-8, UTF-16, UTF-32, big and little-endian forms). Refer to the "system.config.iconv" file under /usr/lib/nls/iconv for the complete listing of all iconv converters supported as part of the base operating system.
ucs2le • • • • • • ucs4 • • • - • • ucs4be • • • - • • ucs4le • • • • • utf8 • • • • • • • Note 1: default endianness depends on the operating system. HP-UX is a big-endian OS. On HPUX, the iconv command interprets UTF-16/32 data as big-endian in the absence of a byte order mark. Note 2: On HP-UX, the iconv command treats UCS-2 and UTF-16, and UCS-4 and UTF-32 as functionally equivalent.
Other Character Set Converters Additional converters are provided in HP-UX 11i v3 for support of the Baltic region (Estonia, Latvia, and Lithuania) to convert between ISO 8859-4, ISO 8859-13, and all Unicode variants. New converters for Russia are provided to convert between Koi-8R and all Unicode variants. Additionally, converters are provided to convert between Code Page 924 (Latin-9 EBCDIC) and ISO 8859-15, as well as all Unicode variants. Tru64 UNIX specific iconv converters are provided.
5. Commands and Libraries Libc Library Routines The following two libc functions have changed in conformance to the Unix2003 standard. strfmon() The behavior of the strfmon() function has been changed slightly to align with the UNIX 2003 standard. The system locales are changed accordingly to make the behavior of the strfmon() function the same as before, unless applications use non-system locales.
The localedef command change should be transparent to customers, unless they use their own customized locales instead of system provided locales. In that case, the behavior of the strfmon() function may be changed, depending on how the LC_MONETARY sections of the customized locales are defined. Applications that linked with the archived libc library will not be impacted, unless they are recompiled for the current release.
eucset Command The eucset command sets and gets display widths of the EUC characters for the terminal. To support the alternate display width properties of Asian UTF8 locales, the new ASIAN_UTF8 argument has been added for the –c option. For more information, refer to the eucset(1) manual page. Messaging Commands (mkcatdefs, dspmsg and dspcat) New messaging commands mkcatdefs, dspmsg, and dspcat were added to HP-UX for compatibility with Tru64 UNIX.
6. Font and Printer Enhancements Internationalized PostScript Printing Support: psfontpf A new PostScript printer filter that supports the printing of international characters in text files and web pages has been added. The new psfontpf printer filter enables the printing of non-English international characters in text files and web pages displayed by Mozilla/Firefox in printers that support the PostScript level 2 or level 3 printing language.
- New encodings, KS X 1001:2002, KS X 1003:1993, and ISO10646 BMP/plane2, are available. Simplified Chinese - ZYCJKHei and ZYCJKSun have been enhanced to include the latest GB18030 and ISO10646 characters. - New typefaces, FZFangSong and FZKai, are available. - New encodings, ISO10646 BMP/plane2, are available. Traditional Chinese - ARMINGTil has been enhanced to include the latest Big5-2003, CNS11643-2004, and ISO10646 characters.
characters. Xlib and XLocale database modules have been enhanced so that "?" or "::" glyphs are displayed. TrueType Fonts for European Codesets Additional TrueType Fonts have been provided for European languages in HP-UX 11i v3. TrueType fonts are used by layered technologies, such as Java, X-Windows, and printer modules. TrueType fonts are required by these technologies to meet European market requirements. The glyph patterns are designed based on Unicode standards, and are indexed as Unicode code points.
7.
8. Obsolescence Deprecated Asian Functionality The following functionality is considered to be deprecated and will be removed in the next major release of HP-UX: • Asian printer lp models: LIPS3, LPS, hpc1200aj, hpc1200ak, hpc1200ac, hpc1200at, and hpc1205at • Japanese specific utility/library routines described in /usr/share/doc/JpnCmdLib.
9. Glossary NLS (National Language Support): HP-UX provides support for various regions and languages through its I18N software subsystem. The NLS components on an HP-UX system are installed in a directory tree rooted at /usr/lib/nls. Unicode: International character set standard that defines characters used in most of the world’s languages for computer use. At the time this document is written the Unicode standard is version 5.0.
Appendix—Summary of Locale and codeset Conversion Support in HP-UX 11i v3 Locales HP-UX 11i v3 supports the following 183 locales: Language country/region locale default default C C.iso88591 C.iso885915 C.utf8 POSIX POSIX POSIX Arabic Algeria ar_DZ.arabic8 ar_DZ.utf8 Saudi Arabia ar_SA.arabic8 ar_SA.iso88596 ar_SA.utf8 Bulgaria Bulgarian bg_BG.iso88595 bg_BG.utf8 Czech Czech Republic cs_CZ.iso88592 cs_CZ.utf8 Danish Denmark da_DK.iso88591 da_DK.iso885915@euro da_DK.roman8 da_DK.
en_US.roman8 en_US.utf8 Spanish Argentina es_AR.iso88591 es_AR.iso885915 es_AR.utf8 Bolivia es_BO.iso88591 es_BO.iso885915 es_BO.utf8 Chile es_CL.iso88591 es_CL.iso885915 es_CL.utf8 Colombia es_CO.iso88591 es_CO.iso885915 es_CO.utf8 Costa Rica es_CR.iso88591 es_CR.iso885915 es_CR.utf8 Dominican Republic es_DO.iso88591 es_DO.iso885915 es_DO.utf8 Ecuador es_EC.iso88591 es_EC.iso885915 es_EC.utf8 Spain es_ES.iso88591 es_ES.iso885915@euro es_ES.roman8 es_ES.utf8 Guatemala es_GT.iso88591 es_GT.
es_NI.iso885915 es_NI.utf8 Panama es_PA.iso88591 es_PA.iso885915 es_PA.utf8 Peru es_PE.iso88591 es_PE.iso885915 es_PE.utf8 Puerto Rico es_PR.iso88591 es_PR.iso885915 es_PR.utf8 Paraguay es_PY.iso88591 es_PY.iso885915 es_PY.utf8 El Salvador es_SV.iso88591 es_SV.iso885915 es_SV.utf8 United States es_US.iso88591 es_US.iso885915 es_US.utf8 Uruguay es_UY.iso88591 es_UY.iso885915 es_UY.utf8 Venezuela es_VE.iso88591 es_VE.iso885915 es_VE.utf8 Estonian Estonia et_EE.iso885915 et_EE.iso88594 et_EE.
France fr_FR.iso88591 fr_FR.iso885915@euro fr_FR.roman8 fr_FR.utf8 Croatian Croatia hr_HR.iso88592 hr_HR.utf8 Hungarian Hungary hu_HU.iso88592 hu_HU.utf8 Icelandic Iceland is_IS.iso88591 is_IS.iso885915@euro is_IS.roman8 is_IS.utf8 Italian Italy it_IT.iso88591 it_IT.iso885915@euro it_IT.roman8 it_IT.utf8 Hebrew Israel iw_IL.hebrew8 iw_IL.iso88598 iw_IL.utf8 Japanese Japan ja_JP.SJIS ja_JP.eucJP ja_JP.kana8 ja_JP.utf8 Korean Korea ko_KR.eucKR ko_KR.utf8 Lithuanian Lithuania lt_LT.
no_NO.roman8 no_NO.utf8 Poland Polish pl_PL.iso88592 pl_PL.utf8 Portuguese Brazil pt_BR.iso88591 pt_BR.iso885915 pt_BR.utf8 Portugal pt_PT.iso88591 pt_PT.iso885915@euro pt_PT.roman8 pt_PT.utf8 Rumanian Romania ro_RO.iso88592 ro_RO.utf8 ru_RU.cp1251 Russian Russian Federation ru_RU.iso88595 ru_RU.koi8r ru_RU.utf8 Slovakian Slovakia sk_SK.iso88592 sk_SK.utf8 Slovenian Slovenia sl_SI.iso88592 sl_SI.utf8 Swedish Sweden sv_SE.iso88591 sv_SE.iso885915@euro sv_SE.roman8 sv_SE.
Taiwan zh_TW.big5 zh_TW.ccdc zh_TW.eucTW zh_TW.
iso85 iso8859_5, iso88595, ISO8859-5 iso86 iso8859_6, iso88596, ISO8859-6 iso87 iso8859_7, iso88597, ISO8859-7 iso88 iso8859_8, iso88598, ISO8859-8 iso89 iso8859_9, iso88599, ISO8859-9 iso813 iso8859_13, iso885913, ISO8859-13 iso815 iso8859_15, iso885915, ISO8859-15 itale italian_e japae japanese_e, ibmkanji jis JIS-2022-JP, JIS7, ISO-2022-JP, iso-2022-jp katae katakana_e koi8r KOI8-R kore5 korean15, eucKR, deckorean koree korean_e, cp933 roc15 ccdc roma8 roman8 sjis japa5,
Note: U* represents all Unicode variants supported by the iconv command in HP-UX 11i v3, namely, ucs2, ucs2be, ucs2le, ucs4, ucs4be, ucs4le and utf8.
cp1146 iso815, U* cp1147 iso815, U* cp1148 iso815, U* cp1149 iso815, U* cp1250 U* cp1251 U* cp1252 U* cp1253 U* cp1254 U* cp1255 U* cp1256 U* cp1257 U* cp1258 U* dechanyu big5, eucTW dechanzi hp15CN deckanji eucJP, sjis engle iso81, roma8 eucJP cp930, cp939, deckanji, japae, jefc, jefc9p, jefk, jefk9p, jipsec, jipsek, jipsj, jis, keis7c, keis7k, keis8c, keis8k, sdeckanji, sjis, sjishi, sjispc, U* eucJP0201 U* eucJP2004 U* eucJPMS U* eucJPp U* eucTW big5, chinte,
iso84 U* iso85 cp866, cp880, U* iso86 arab8, arabe, U* iso87 gree8, greee, U* iso88 hebr8, hebre, U* iso89 turk8, turke, U* iso813 U* iso815 cp1140, cp1141, cp1142, cp1143, cp1144, cp1145, cp1146, cp1147, cp1148, cp1149, cp924, iso81, U* itale iso81, roma8 japae eucJP, sjis, U* jefc eucJP, sjis, U* jefc9p eucJP, sjis, U* jefc9pEX U* jefcEX U* jefk eucJP, sjis, U* jefk9p eucJP, sjis, U* jefk9pEX U* jefkEX U* jipsec eucJP, sjis, U* jipsecEX U* jipsek eucJP, sjis, U* j
kore5 koree, U* koree kore5 roc15 big5, chinte, eucTW, U* roma8 cp037, cp277, cp500, engle, finne, frene, germe, icele, iso81, itale, spane, swede, U* sdeckanji eucJP, sjis sjis cp930, cp939, deckanji, eucJP, japae, jefc, jefc9p, jefk, jefk9p, jipsec, jipsek, jipsj, jis, jishp, keis7c, keis7k, keis8c, keis8k, sdeckanji, U* sjis0201 U* sjis2004 U* sjisMS U* sjishi eucJP, jis sjisp U* sjispc eucJP, jis spane iso81, roma8 swede iso81, roma8 thai8 thaie, U* thaie thai8 turk8 iso
sjis2004, sjisMS, sjisp, thai8, ucs2, ucs2be, ucs2le, ucs4le, utf8 ucs4be big5, cp1140, cp1141, cp1142, cp1143, cp1144, cp1145, cp1146, cp1147, cp1148, cp1149, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp924, cp930, cp939, eucJP, eucJP0201, eucJP2004, eucJPMS, eucJPp, eucTW, gb18030, greee, hkbig5, hp15CN, iso81, iso813, iso815, iso82, iso84, iso85, iso86, iso87, iso