bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16812: Eszett handling


From: Ben Boeckel
Subject: bug#16812: Eszett handling
Date: Wed, 19 Feb 2014 13:59:18 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

[ I am not subscribed; please keep me on the CC. ]

Hi,

>From the new grep announcement on LWN[1], I had a thought about how the
German eszett was handled. It seems that it hasn't been handled at all.
This may fall to the same resolution as the recent LJ/Lj thread[2]
though.

Basically, it seems that grep doesn't support alternates when changing
case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the
context[3]. From some poking, only the latter is supported. My
thought[4] was that the code would generate '[ßSS]' which would be wrong
when matching and would instead need to do '(ß|SS)'. It now seems that
'(ß|SS|ẞ)' or even '(ß|[sS][sS]|ẞ)' would need to be generated instead
using the new code.

I've attached a test case I wrote based on 'turkish-eyes'. I release it
to the public domain.

Thanks,

--Ben

[1]https://lwn.net/Articles/586899/
[2]https://lists.gnu.org/archive/html/bug-grep/2014-02/msg00004.html
[3]https://en.wikipedia.org/wiki/Capital_%C3%9F
[4]https://lwn.net/Articles/587010/

Attachment: german-eszett
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]