[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: Ulrich Mueller
Subject: Re: Case mapping of sharp s
Date: Mon, 16 Nov 2009 17:38:26 +0100

>>>>> On Mon, 16 Nov 2009, Kenichi Handa wrote:

> In article <address@hidden>, Ulrich Mueller <address@hidden> writes:
>> In Unicode since version 5.1.0 the U+1E9E code point is assigned
>> to "LATIN CAPITAL LETTER SHARP S". Would it be possible to add a
>> mapping from this to the lower case ß, as in the patch below?

>> However, I've noticed that similar mappings for Turkish ı (dotless
>> i) and İ (I with dot) were commented out [1]. Is it still so that
>> such a change would "make searches slow", as stated in the comment?

> That kind of setting surely makes the searching of ß and ẞ slow
> because we can't use BM search when case-fold-search is non-nil.
> BM search is possible only when all case-equivalent characters are
> represented by the same byte length, and differ only in the last
> byte.

So do I understand this right: In order to perform a Boyer-Moore
search, the characters have to be either both ASCII, or must be in the
same group of 64 adjacent characters (because the last byte in UTF-8
encodes 6 bits)?

Is that the reason why also ÿ and Ÿ (U+00FF and U+0178, small/capital
y with diaeresis) don't form a case pair?

> So, if you are sure that searching of ß is very rare (I have
> no idea), please install it.

Usage of (lower case) ß is very common in a German language context,
so I'd guess that searching for it is not so rare.

On the other hand, capital ẞ is not used in regular German orthography
(that's probably the reason why the character was added to Unicode
only in 2008). So if the change would cause large tradeoffs in search
speed, then I think it's not worthwhile.

By what factor is the non-BM search slower, as compared to the BM

> By the way, I think it's possible to improve the current BM-search
> for such a case. For instance, to search "straße", we at first do
> BM-search for "stra" part and then check the remaining "ße" part.
> Aren't there any challenger?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]