bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso


From: Eric Blake
Subject: bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]'
Date: Mon, 28 Nov 2016 10:53:04 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

On 11/27/2016 10:57 PM, Jim Meyering wrote:
> When grep is configured --with-included-regex, the following command
> fails to print the expected match:
> 
>    printf '\351\n' |LC_ALL=fr_FR.iso88591 src/grep '[d-f]'

But the problem is that POSIX does NOT define what the "expected match"
should be. The very fact that you're using a non-C locale but passing a
range means that you have unspecified behavior per POSIX.  Some regex
engines treat 'e' and 'e-acute' as both being part of the range, others
treat only 'e' as being part of the range.  Expecting any particular
behavior is a bug, unless you know for sure that you are using GNU's
"rational range behavior" which explicitly treats ranges in ALL locales
the same as if they were in the C locale (that is, e-acute is never part
of the [d-f] range under rational range behavior).

> 
> Since it's always been this way, I don't plan to attempt a work-around
> before the next release, and instead will probably arrange for that
> test to be skipped when grep is built with the included regex.
> 
> Other ideas welcome,

We SHOULD be adjusting more and more GNU tools to honor rational range
behavior, at least as an option, even if that means that e-acute can
never be matched to [d-f].

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]