[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Case insensitivity seems to ignore lower bound of interval
From: |
Eric Blake |
Subject: |
Re: Case insensitivity seems to ignore lower bound of interval |
Date: |
Wed, 27 Apr 2011 14:55:49 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110307 Fedora/3.1.9-0.39.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.9 |
On 04/27/2011 02:40 PM, John Cowan wrote:
> Aharon Robbins scripsit:
>
>> I do agree that the behavior is suprising, disconcerting, undesirable,
>> and so on. For this reason, the upcoming version of gawk translates
>> ranges of the form [d-h] into '[defgh]' before compiling the regular
>> expression.
>
> Alas, that means that in a locale where e-acute sorts after e, the regex
> [d-h] will not match it. You can't have everything at once, but it
> would be good to have a switch to turn this behavior on and off.
POSIX already states that the regex [d-h] is unspecified in all but the
C locale, because there is no one-size-fits-all intepretation of what it
_should_ represent. If you want e-acute in the set, it is always better
to ask for it explicitly. Meanwhile, I welcome this change, as it is
easier to document that the expansion always mirrors the C locale rather
than the expansion depends on the collation order of the current locale.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature