Jim Meyering wrote:
Bruno Haible wrote:
Paolo,
[=e=] to match "e" as well as accented versions like é, è and ê).
That is the one feature that you get with glibc, and that you would
sacrifice when building --with-included-regex.
I agree. It's up to distros to choose, of course.
If you are on the point of sacrificing a glibc feature in many programs,
then IMO you should first talk with the glibc people to see what alternative
they can offer.
People who build the tools currently have the choice of using
--with-included-regex or
--without-included-regex
Note that putting equivalence classes (and backrefs) aside, the
interpretation of ranges is done in dfa.c, which means the vast
majority of range uses never even require use of regexp code.
However, backreferences force these tools to skip the DFA-based
optimization and resort to running the regexp code. In that case,
there is a dichotomy. Adding a backreference to a range-including
regexp would have the surprising consequence of changing how that range
is interpreted when the tool is built to use glibc's regexp code.
Thus, if we go this route, we are effectively saying
that people who want self-consistent regex-handling
in our tools must build with --with-included-regex or end
up causing subtle problems.
That's a big leap.
I'm not saying I won't take upstream grep over the edge,
but I'd like to hear what a few distro-maintainers think.
To clarify...
I like Arnold's proposal to make regex range handling sane
and locale-independent.
It goes like this (at least for gawk, grep and sed):
change how dfa.c interprets ranges like [a-z]
change how gnulib's reg* code handles ranges
Always use the included regex code (the one from gnulib),
so that its interpretation is consistent with that of dfa.c.
Grep's current upstream default is to build --with-included-regex,
which makes grep use glibc's regex code.
To make this proposed change go through, that configure-time option would
have to be eliminated, so that we always build with the gnulib-provided
regex code. Of course, if glibc ever changes, we can detect that and
automatically prefer it when possible.