[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: accents

From: Chet Ramey
Subject: Re: accents
Date: Sun, 15 May 2011 19:10:21 -0400
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9

On 5/15/11 6:38 PM, Andreas Schwab wrote:
> Chet Ramey <address@hidden> writes:
>> On 5/10/11 9:17 AM, Greg Wooledge wrote:
>>> In yours, however, it is 0x65 0xcc 0x81 which is U+0065 LATIN SMALL
>> That's not valid UTF-8, since UTF-8 requires that the shortest sequence
>> be used to encode a character.
> 0x65 0xcc 0x81 is the correct UTF-8 encoding for the two character
> sequence U+0065 U+0301.

That's a non sequitor.  My point is that, as I read it, UTF-8 requires the
use of the shortest sequence that can represent a particular character.
In this case, that means that U+00E9 must be used to represent e with acute
intead of e plus U+0301.

This doesn't mean that Mac OS X and maybe bash don't have problems with
certain combining characters.

>> The general problem with combining
>> characters still exists (the one in the message I referenced in an
>> earlier reply), but this case has more to do with Mac OS X and its use
>> of both precomposed and decomposed UTF-8 than anything.
> There is no such thing as "precomposed UTF-8" and "decomposed UTF-8".

Sorry, I meant to write "unicode".

> UTF-8 is an encoding of Unicode, and both NFD and NFC are valid forms of
> Unicode.

Sure, nobody's arguing that.  The point is that the utf-8 encodings
of precomposed and decomposed unicode are different, so you're not going
to see the same byte sequence on the keyboard as the file system on Mac
OS X.  Applications have to work around that.

``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    address@hidden    http://cnswww.cns.cwru.edu/~chet/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]