bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9580: sort 8.5 bug?


From: Eric Blake
Subject: bug#9580: sort 8.5 bug?
Date: Thu, 22 Sep 2011 16:01:44 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.22) Gecko/20110906 Fedora/3.1.14-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.14

tag 9580 notabug
thanks

On 09/22/2011 02:55 PM, Sean Sun wrote:
So basi­cally, append­ing a let­ter after ‘.’ would reverse the sort order.
That doesn't look quite right. Is there an explanation for this behavior?
I've tried the same on a Mac, and their sort (5.93) woks just fine.

Thanks for the report, but this is not a bug in sort. Actually, both versions that you tried (8.5 and 5.93) sort in the same way, where the difference is in your choice of locale, and you are hitting this FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Newer coreutils added a --debug option to help you learn why the bug is in your expectations and not in sort (8.13 is current, but --debug has been present since 8.6). So let's use it:

$ printf '.\nBAD.\n.s\nBAD.s\n' | sort --debug
sort: using `en_US.UTF-8' sorting rules
.
_
BAD.
____
BAD.s
_____
.s
__

$ printf '.\nBAD.\n.s\nBAD.s\n' | LC_ALL=C sort --debug
sort: using simple byte comparison
.
_
.s
__
BAD.
____
BAD.s
_____


Remember, the en_US.UTF-8 locale uses dictionary order collation, which treats punctuation as insignificant, and blends case. That is, 's' and '.s' collate as the same string, and '.s' is larger than 'BAD.' since 's' comes later in the alphabet than 'B'.

On the other hand, the C locale uses ASCII ordering, where every byte is significant, and '.' sorts before 'B'.


I've also tried set LC_ALL='C'. Just in case it's a funky locale problem,
but didn't make a difference.

Are you sure you used the correct syntax? The way you wrote it, it looks like you tried:

$ set LC_ALL='C'

But that is neither sh (export LC_ALL=C) nor csh (setenv LC_ALL C) syntax. And your problem is absolutely explained by locale, and would indeed be "solved" if you indeed had set LC_ALL=C like you meant to do.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]