[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?

From: Chet Ramey
Subject: Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
Date: Thu, 27 Jun 2013 15:13:12 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6

On 6/27/13 4:48 AM, Paolo Bonzini wrote:
> Il 27/06/2013 09:33, Aharon Robbins ha scritto:
>> Hi Paolo.
>>>> I still believe that there is no place other than the glibc locale
>>>> descriptions where this can be fixed.
>> This is necessary but not sufficient. All of gawk, grep, sed and bash
>> run on lots of non-GLIBC systems.
> On non-glibc systems they use gnulib's regex implementation, so they're
> fine.

You presume much.  Bash, for instance, doesn't use a regex implementation,
especially not gnulib's.  gnulib code is, in practice, difficult to use on
an individual module basis, and doesn't provide enough of a benefit to go
through the effort of breaking it out of gnulib and putting it into bash.

>> The locale definitions, even for
>> the same locale, vary wildly out in the wild.  Therefore there's no
>> other practical choice but to fix each program to provide Rational
>> Range Interpretation.
>> Fortunately, gawk and grep are already there, and I think the sed in
>> the git repo is as well.  Once Bash turns this on as default, the
>> world will definitely be a better place, independent of GLIBC.
> I already explained this multiple times how this is completely delusional.

A little bit strong, no?  If you use your own matching code, it's a small
matter to change strcoll to strcmp.

> 1) grep, sed, coreutils and so on will only use representation-based
> range interpretation (I prefer this more neutral term that also explains
> what's going on) if you use gnulib's regex implementation.  And by
> default, they use glibc (I just checked grep).
> 2) Even if you switched the default, you would be at the mercy of
> distros.  Distros prefer to avoid glibc replacements in single packages,
> because then all bugs have to be fixed in many different places.  In
> fact, I checked grep and Fedora builds it with --without-included-regex.

There are systems of interest besides Linux and its distros.

> Not to mention how this is entirely Latin-centric.  There are some
> encodings in which there is absolutely no relation between the encoding
> and the expected collation order.

And there's no portable way to obtain this information in any case, glibc
or not.  So if this is to be `fixed' only either by changing every locale
definition everywhere or changing the matching code, I vote for changing
the matching code.  We just have to agree on an interpretation and make
sure the various matchers agree.

``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    address@hidden    http://cnswww.cns.cwru.edu/~chet/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]