bug#16812: Eszett handling

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16812: Eszett handling

From:	Eric Blake
Subject:	bug#16812: Eszett handling
Date:	Wed, 19 Feb 2014 13:27:58 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 02/19/2014 11:59 AM, Ben Boeckel wrote:
> [ I am not subscribed; please keep me on the CC. ]
> 
> Hi,
> 
>>From the new grep announcement on LWN[1], I had a thought about how the
> German eszett was handled. It seems that it hasn't been handled at all.
> This may fall to the same resolution as the recent LJ/Lj thread[2]
> though.
> 
> Basically, it seems that grep doesn't support alternates when changing
> case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the
> context[3].

Alas, in terms of POSIX functionality, we can only change case between
single-character entities.  Changing ß to SS is a
single->multi-character change; it is DIFFERENT than the Turkish i
situation (there, although we change between single-byte and multi-byte,
the changes are still always single character).  Similar problems apply
to Greek trailing sigma, which is also a context-sensitive change operation.

As long as we are stuck using the POSIX definition of case changes on a
character-by-character basis, where the input and output are 1:1
character mappings, we cannot handle the German eszett case specially.
For PROPER handling of locale-sensitive case rules, we'd need full
Unicode rules that operate on words, rather than characters, which
quickly gets out of scope of what we can do in POSIX regex.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

bug#16812: Eszett handling, Ben Boeckel, 2014/02/19
- bug#16812: Eszett handling, Eric Blake <=
- bug#16812: Eszett handling, Johannes Meixner, 2014/02/20

Prev by Date: bug#16812: Eszett handling
Next by Date: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Previous by thread: bug#16812: Eszett handling
Next by thread: bug#16812: Eszett handling
Index(es):
- Date
- Thread