[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: YAMAMOTO Mitsuharu
Subject: Re: UCS-2BE
Date: Fri, 01 Sep 2006 20:30:26 +0900
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/22.0.50 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI)

>>>>> On Fri, 01 Sep 2006 10:19:34 +0900, Kenichi Handa <address@hidden> said:

> UCS-XXX are CEF, and UTF-XXX are CES.  So, UCS-XXX are not
> appropriate lavel names for specifing how to byte-serialize
> characters (i.e. on saving characters in a file).  At least, that is
> the official definition in Unicode.

IIUC, UCS is in the ISO/IEC 10646 terminology, rather than in the
Unicode terminology except Unicode 1.1 (though there would be some
references in the documentations, of course.)

"Unicode Technical Report #17, Character Encoding Model"
(http://www.unicode.org/reports/tr17/index.html) says:

  Examples of encoding forms as applied to particular coded character

    Name           Encoding forms
    Unicode 4.0    UTF-16 (default), UTF-8, or UTF-32 encoding form
    Unicode 3.0    either UTF-16 (default) or UTF-8 encoding form
    Unicode 1.1    either UCS-2 (default) or UTF-8 encoding form
    ISO/IEC 10646, depending on the declared implementation levels, may
                   have UCS-2, UCS-4, UTF-16, or UTF-8.

  Examples of Unicode Character Encoding Schemes:

    The Unicode Standard has seven character encoding schemes: UTF-8,
    UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE.

    Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE,
    and UCS-2LE, although the latter two were not named that way at
    the time.

I suspect "UCS-2BE" is just a customary name and not explicitly
defined even in ISO/IEC 10646.

"UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html)

  No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16,
  and UTF-32, though ISO 10646-1 says that Bigendian should be
  preferred unless otherwise agreed.  It has become customary to
  append the letters "BE" (Bigendian, high-byte first) and "LE"
  (Littleendian, low-byte first) to the encoding names in order to
  explicitly specify a byte order.

                                     YAMAMOTO Mitsuharu

reply via email to

[Prev in Thread] Current Thread [Next in Thread]