[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UCS-2BE
From: |
YAMAMOTO Mitsuharu |
Subject: |
Re: UCS-2BE |
Date: |
Fri, 01 Sep 2006 20:30:26 +0900 |
User-agent: |
Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/22.0.50 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) |
>>>>> On Fri, 01 Sep 2006 10:19:34 +0900, Kenichi Handa <address@hidden> said:
> UCS-XXX are CEF, and UTF-XXX are CES. So, UCS-XXX are not
> appropriate lavel names for specifing how to byte-serialize
> characters (i.e. on saving characters in a file). At least, that is
> the official definition in Unicode.
IIUC, UCS is in the ISO/IEC 10646 terminology, rather than in the
Unicode terminology except Unicode 1.1 (though there would be some
references in the documentations, of course.)
"Unicode Technical Report #17, Character Encoding Model"
(http://www.unicode.org/reports/tr17/index.html) says:
Examples of encoding forms as applied to particular coded character
sets:
Name Encoding forms
Unicode 4.0 UTF-16 (default), UTF-8, or UTF-32 encoding form
Unicode 3.0 either UTF-16 (default) or UTF-8 encoding form
Unicode 1.1 either UCS-2 (default) or UTF-8 encoding form
ISO/IEC 10646, depending on the declared implementation levels, may
have UCS-2, UCS-4, UTF-16, or UTF-8.
Examples of Unicode Character Encoding Schemes:
The Unicode Standard has seven character encoding schemes: UTF-8,
UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE.
Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE,
and UCS-2LE, although the latter two were not named that way at
the time.
I suspect "UCS-2BE" is just a customary name and not explicitly
defined even in ISO/IEC 10646.
"UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html)
says:
No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16,
and UTF-32, though ISO 10646-1 says that Bigendian should be
preferred unless otherwise agreed. It has become customary to
append the letters "BE" (Bigendian, high-byte first) and "LE"
(Littleendian, low-byte first) to the encoding names in order to
explicitly specify a byte order.
YAMAMOTO Mitsuharu
address@hidden
- Re: UCS-2BE, Andreas Schwab, 2006/09/01
- Re: UCS-2BE,
YAMAMOTO Mitsuharu <=