bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16812: Eszett handling


From: Johannes Meixner
Subject: bug#16812: Eszett handling
Date: Thu, 20 Feb 2014 11:07:34 +0100 (CET)
User-agent: Alpine 2.00 (LNX 1167 2008-08-23)


Hello,

On Feb 19 13:59 Ben Boeckel wrote (excerpt):
[ I am not subscribed; please keep me on the CC. ]
...
I had a thought about how the German eszett was handled
...
Basically, it seems that grep doesn't support alternates when changing
case. The uppercase of 'ß' is either 'SS' or '?' depending on the
context

As far as I understand it you are talking about
"Unicode case folding".

As far as I know grep does not support "Unicode case folding".

Currently grep works on a pure "character by character" base
where each character could be in UTF-8 encoding (a possible
encoding for Unicode characters) so that grep supports
the UTF-8 encoding which could be misunderstood that
grep supports Unicode but the latter is not true.

For more details see the various (usually very long mail threads)
regarding "grep -i" in particular together with UTF-8.

For example on

http://lists.gnu.org/archive/html/bug-grep/2012-06/threads.html#00011

mail threads like
"Ignore case handling of special unicode characters (case folding)"
which is
http://savannah.gnu.org/bugs/?36682
or the mail thread
"grep -i (case-insensitive) is broken with UTF8"


Kind Regards
Johannes Meixner
--
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer

reply via email to

[Prev in Thread] Current Thread [Next in Thread]