bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?


From: Eric Blake
Subject: Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
Date: Mon, 21 May 2012 20:02:52 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

On 05/21/2012 05:42 PM, Linda Walsh wrote:

>> POSIX explicitly undefined ranges for all but the C locale.  _Other
>> standards_, such as Unicode, are free to add range requirements on top
>> of what POSIX requires, but alas, Unicode collation order does NOT
>> currently specify anything about regular expression or glob range
>> matching, so it is out of scope for Unicode to say what [A-Z] expands to.
> 
> 
> ----
> 
> I think this is the problem.
> 
> A-Z in regular expressions is defined to expand to those characters
> that are _in collating order_, >A, and <Z...

Only in POSIX 1992 or in the C locale.  In POSIX 2001 and POSIX 2008,
and non-C locales, [A-Z] is explicitly undefined, because the definition
of characters in collating order between A and Z did not work out.

> 
> Without a collating order that expression in RE's would never have made any
> sense.  It requires a collating order and is dependent on it.

They still don't make any sense in any locale except C, because POSIX no
longer requires collating order.

> The regex(7) man page says that [xx-xx] uses ***collating order**::

The regex(7) man page _of which system_?  Just because _some_ systems
(like glibc, picking the POSIX 1992 semantics) have well-defined
semantics, doesn't mean that all systems have those same semantics.
According to POSIX, you cannot portably assume ANY semantics for ranges
except in the C locale.  And if RRI gains traction, that means that you
can assume ASCII collation, across ALL locales, but this is a different
order than collation of a specific locale, and it is also a GNU
extension not guaranteed by POSIX.

> ----
> Seems pretty clear -- regex's aren't exempt from collating order, they
> depend on it...

Only on platforms where libc has chosen to provide an extension beyond
POSIX, and where GNU programs have not further overridden things to
avoid the unexpected glibc semantics.

-- 
Eric Blake   address@hidden    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]