[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fri, 01 Sep 2006 21:26:59 +0900
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)
Thank you for the info!
In article <address@hidden>, YAMAMOTO Mitsuharu <address@hidden> writes:
> "Unicode Technical Report #17, Character Encoding Model"
> (http://www.unicode.org/reports/tr17/index.html) says:
> Examples of Unicode Character Encoding Schemes:
> Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE,
> and UCS-2LE, although the latter two were not named that way at
> the time.
Ah! So here we can see the term "UCS-2BE" as CES. But how
it was defined? (I don't have Unicode 1.1)
> I suspect "UCS-2BE" is just a customary name and not explicitly
> defined even in ISO/IEC 10646.
> "UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html)
> No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16,
> and UTF-32, though ISO 10646-1 says that Bigendian should be
> preferred unless otherwise agreed. It has become customary to
> append the letters "BE" (Bigendian, high-byte first) and "LE"
> (Littleendian, low-byte first) to the encoding names in order to
> explicitly specify a byte order.
I don't know how much authorized this page is, but it also
A full featured character encoding converter will have
to provide the following 13 encoding variants of Unicode
UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4LE, UCS-4BE,
UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE,
It seems that UCS-2BE is not a mis-label of UTF-16BE, then,
it seems that treating it as a subset (not using surrogate
pair) of UTF-16BE (as done in iconv) is the right thing.
I'll try to implement it (and others) in emacs-unicode-2.
By the way, why do people want such many variants... sigh...