[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16812: Eszett handling
From: |
Ben Boeckel |
Subject: |
bug#16812: Eszett handling |
Date: |
Wed, 19 Feb 2014 13:59:18 -0500 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
[ I am not subscribed; please keep me on the CC. ]
Hi,
>From the new grep announcement on LWN[1], I had a thought about how the
German eszett was handled. It seems that it hasn't been handled at all.
This may fall to the same resolution as the recent LJ/Lj thread[2]
though.
Basically, it seems that grep doesn't support alternates when changing
case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the
context[3]. From some poking, only the latter is supported. My
thought[4] was that the code would generate '[ßSS]' which would be wrong
when matching and would instead need to do '(ß|SS)'. It now seems that
'(ß|SS|ẞ)' or even '(ß|[sS][sS]|ẞ)' would need to be generated instead
using the new code.
I've attached a test case I wrote based on 'turkish-eyes'. I release it
to the public domain.
Thanks,
--Ben
[1]https://lwn.net/Articles/586899/
[2]https://lists.gnu.org/archive/html/bug-grep/2014-02/msg00004.html
[3]https://en.wikipedia.org/wiki/Capital_%C3%9F
[4]https://lwn.net/Articles/587010/
german-eszett
Description: Text document
- bug#16812: Eszett handling,
Ben Boeckel <=