bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort seems deficient


From: Jim Meyering
Subject: Re: sort seems deficient
Date: Sun, 07 Sep 2008 21:36:23 +0200

"Dr. David Alan Gilbert" <address@hidden> wrote:

> * Jim Meyering (address@hidden) wrote:
>> address@hidden wrote:
>> > ??? I just used sort on a redhat Enterprise 5 server.
>> >
>> > ??? Sort seems to ignore leading "." characters.? This is incorrect.
>>
>> How sort works depends on your locale.
>> This link explains and tells you how to change that behavior:
>>
>> http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>
> That explanation is somewhat unclear whether it's due to an unexpected
> behaviour of the locale or the locale tables actually being broken.
> I tried to read bits of the Unicode spec last time I hit this
> and came away not being entirely sure whether it was actually
> valid behaviour.
>
> If someone could point to something which says 'you should sort
> these non-alphanumeric characters like this' and the Linux one
> doesn't then perhaps someone will fix it.

The problem is with expectations.  People are not used to sort ignoring
non-alphanumerics, yet with certain locales, it does, and that is normal
and required behavior.

Here's an example.
In the en_US locale, the leading bytes are ignored, because
the locale tables (as required by standards) define the
collating sequences that way:

  $ printf '%s\n' _a .b ,c /d -e \:f| LC_ALL=en_US sort
  _a
  .b
  ,c
  /d
  -e
  :f

When collating with the C locale, those bytes *are* used:

  $ printf '%s\n' _a .b ,c /d -e \:f| LC_COLLATE=C sort
  ,c
  -e
  .b
  /d
  :f
  _a




reply via email to

[Prev in Thread] Current Thread [Next in Thread]