bug-lilypond
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wide-char is wide


From: Hans Aberg
Subject: Re: wide-char is wide
Date: Wed, 25 Mar 2009 21:59:14 +0100


On 25 Mar 2009, at 17:55, Francisco Vila wrote:

I am now confused because Trevor has said that the hex value is a
variable length coding value for the Unicode entity, therefore this
hex number has to follow the utf-8 rules, not utf-32 which is always a
32bit fixed-length value.
...
... after Trevor I now think the hex value _is_ utf-8
coded. I might be completely wrong.

You might search this page for "code point":
  http://en.wikipedia.org/wiki/Unicode

It just a natural number assign to each abstract character it defines. The section
  http://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology
describes the convention of writing these numbers with the prefix "U +": numbers below 2^16 are written with four hex digit, and other with five or six as is needed.

Then, in order to get it into a computer, one uses an encoding that translates these numbers into byte sequences. Among these are UTF-8, UTF-16 and UTF-32. The last, UTF-32 ought to be simplest, because it just takes the code point in binary number base, but since one does not agree on how to sort out the order of bytes in a computer, there are two: UTF-32BE (big endian, used by PowerPC) and UTF-32LE (little endian, used by IntelPC). Similarly for UTF-16, which was invented in the days when one thought 16 would be enough for all Unicode, but later extended in an irregular way.

UTF-8 does not have this endianness problem, as mostly one today mostly agrees on how to sort out the bits in a byte. It was invented for use on UNIX computers. It is constructed so that bytes with highest bit 0 have the same value in ASCI, and all other characters have highest set to 1 and are multibyte. It was adopted by Unicode, which imposed a limit on the number of characters. So strictly speaking, there are two UTF-8.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]