[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso
From: |
Eric Blake |
Subject: |
bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]' |
Date: |
Mon, 28 Nov 2016 10:53:04 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 |
On 11/27/2016 10:57 PM, Jim Meyering wrote:
> When grep is configured --with-included-regex, the following command
> fails to print the expected match:
>
> printf '\351\n' |LC_ALL=fr_FR.iso88591 src/grep '[d-f]'
But the problem is that POSIX does NOT define what the "expected match"
should be. The very fact that you're using a non-C locale but passing a
range means that you have unspecified behavior per POSIX. Some regex
engines treat 'e' and 'e-acute' as both being part of the range, others
treat only 'e' as being part of the range. Expecting any particular
behavior is a bug, unless you know for sure that you are using GNU's
"rational range behavior" which explicitly treats ranges in ALL locales
the same as if they were in the C locale (that is, e-acute is never part
of the [d-f] range under rational range behavior).
>
> Since it's always been this way, I don't plan to attempt a work-around
> before the next release, and instead will probably arrange for that
> test to be skipped when grep is built with the included regex.
>
> Other ideas welcome,
We SHOULD be adjusting more and more GNU tools to honor rational range
behavior, at least as an option, even if that means that e-acute can
never be matched to [d-f].
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature