bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with character ranges in grep


From: Jim Meyering
Subject: Re: Dealing with character ranges in grep
Date: Fri, 10 Jun 2011 22:19:58 +0200

Bruno Haible wrote:
>> With my proposal, distros/people that use --with-included-regex would
>> get understandable semantics + no equivalence classes
>> ...
>> locale behavior of regex are irremediably
>> broken.  For example, when you have a collation element, you can match
>> it using ranges (e.g. [d-i] matches "ch" in Czech; "ch" collates after
>> "h"), and even apply negation (e.g. [^c-h] matches "ch" too).  However
>> there is no way to anchor your match to the beginning of the collation
>> element.  So "chci" matches both /[c-h]+ci/ and /[^c-h]+ci/.  It is
>> beyond repair, and [=e=] is the only part that can be salvaged.
>
> So, Jim and you appear to agree that equivalence classes [=e=] are a
> reasonable feature outside LC_ALL=C.
>
> What would it take to let distros/people use --with-included-regex and
> get understandable semantics for ranges + working equivalence classes?
>
> I would prefer that to your proposal, because it cannot be seen as a
> regression by people who care about equivalence classes.
>
> Can that be done through gnulib code?

A glibc-independent solution would be great.
Then GNU tr's equivalence classes could finally become useful
even on non-glibc systems.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]