[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort seems deficient
From: |
Jim Meyering |
Subject: |
Re: sort seems deficient |
Date: |
Sun, 07 Sep 2008 21:36:23 +0200 |
"Dr. David Alan Gilbert" <address@hidden> wrote:
> * Jim Meyering (address@hidden) wrote:
>> address@hidden wrote:
>> > ??? I just used sort on a redhat Enterprise 5 server.
>> >
>> > ??? Sort seems to ignore leading "." characters.? This is incorrect.
>>
>> How sort works depends on your locale.
>> This link explains and tells you how to change that behavior:
>>
>> http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>
> That explanation is somewhat unclear whether it's due to an unexpected
> behaviour of the locale or the locale tables actually being broken.
> I tried to read bits of the Unicode spec last time I hit this
> and came away not being entirely sure whether it was actually
> valid behaviour.
>
> If someone could point to something which says 'you should sort
> these non-alphanumeric characters like this' and the Linux one
> doesn't then perhaps someone will fix it.
The problem is with expectations. People are not used to sort ignoring
non-alphanumerics, yet with certain locales, it does, and that is normal
and required behavior.
Here's an example.
In the en_US locale, the leading bytes are ignored, because
the locale tables (as required by standards) define the
collating sequences that way:
$ printf '%s\n' _a .b ,c /d -e \:f| LC_ALL=en_US sort
_a
.b
,c
/d
-e
:f
When collating with the C locale, those bytes *are* used:
$ printf '%s\n' _a .b ,c /d -e \:f| LC_COLLATE=C sort
,c
-e
.b
/d
:f
_a