bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with character ranges in grep


From: Jim Meyering
Subject: Re: Dealing with character ranges in grep
Date: Wed, 15 Jun 2011 21:12:39 +0200

Bruno Haible wrote:
> Paolo,
>
>> > [=e=] to match "e" as well as accented versions like é, è and ê).
>> > That is the one feature that you get with glibc, and that you would
>> > sacrifice when building --with-included-regex.
>>
>> I agree.  It's up to distros to choose, of course.
>
> If you are on the point of sacrificing a glibc feature in many programs,
> then IMO you should first talk with the glibc people to see what alternative
> they can offer.

People who build the tools currently have the choice of using
--with-included-regex or
--without-included-regex

Note that putting equivalence classes (and backrefs) aside, the
interpretation of ranges is done in dfa.c, which means the vast
majority of range uses never even require use of regexp code.

However, backreferences force these tools to skip the DFA-based
optimization and resort to running the regexp code.  In that case,
there is a dichotomy.  Adding a backreference to a range-including
regexp would have the surprising consequence of changing how that range
is interpreted when the tool is built to use glibc's regexp code.

Thus, if we go this route, we are effectively saying
that people who want self-consistent regex-handling
in our tools must build with --with-included-regex or end
up causing subtle problems.

That's a big leap.
I'm not saying I won't take upstream grep over the edge,
but I'd like to hear what a few distro-maintainers think.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]