[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: Ulrich Mueller
Subject: Re: Case mapping of sharp s
Date: Fri, 20 Nov 2009 15:43:05 +0100

>>>>> On Fri, 20 Nov 2009, Stephen J Turnbull wrote:

>> When the search is for equivalence classes of characters (e.g. case
>> folding), then I think it must operate on whole characters and
>> therefore has to find the start of each multibyte sequence.

> This is false for certain equivalence classes, namely those that
> cause only one octet of the multibyte representation to change. For
> Mule encoding, this works for ranges of 96 characters, such as all
> the unibyte charsets. For UTF-8, it works for ASCII, and IIRC for
> letters in the Latin-1 set, and maybe many other Latin letters.

And how do you know that a UTF-8 encoded character is e.g. in the
Latin-1 set, without testing the first byte(s) of the multibyte


reply via email to

[Prev in Thread] Current Thread [Next in Thread]