bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek


From: Eli Zaretskii
Subject: bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek
Date: Wed, 09 Dec 2020 17:46:10 +0200

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 9 Dec 2020 15:37:19 +0100
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, Aidan Kehoe <kehoea@parhasard.net>,
>         11309-done@debbugs.gnu.org
> 
> ß is a lower case letter so lowercasep(ß)=false is wrong. As a consequence, 
> matching ß with [:lower:] and [:upper:] don't work correctly: ß should be 
> matched by [:lower:] when case-fold-search is nil, and by both [:lower:] and 
> [:upper:] when case-fold-search is non-nil.
> 
> The problem stems from the fact that uppercasep and lowercasep don't use the 
> Unicode case information directly (which perhaps they should) but derive the 
> case indirectly from the upcase and downcase tables, and there is no way to 
> state that a char is lower case but cannot be upcased or downcased. (Below 
> I'm going to use the notation T[C] for the table T indexed by character C.)
> 
> Currently, characters missing from or self-mapping in the upcase and downcase 
> tables are considered to be caseless. For instance, upcase[*]=downcase[*]=* 
> and upcase[中]=downcase[中]=nil. However, we also have upcase[ß]=downcase[ß]=ß, 
> causing the incorrect lowercasep result.
> 
> The solution that I ended up applying was the simplest possible: set 
> upcase[ß]=ẞ (U+7838). The special-uppercase properties ensure that (upcase 
> "ß") => "SS", and now all tests pass.
> 
> (An acceptable alternative would have been to set upcase[ß]=nil and adapt 
> lowercasep accordingly. I tried that and it works flawlessly, but involves 
> slightly more changes.)
> 
> And that concludes the resolution of this bug.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]