bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#28847: Maybe a bug in "sort (GNU coreutils) 8.4" report


From: Eric Blake
Subject: bug#28847: Maybe a bug in "sort (GNU coreutils) 8.4" report
Date: Mon, 16 Oct 2017 06:49:35 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

tag 28847 notabug
thanks

On 10/15/2017 02:58 AM, kakaxixi777 wrote:
>    Dear coreutils :
>    I am a Research and Development Engineer in IT. I met a situation when
>    I use “sort” command in Linux shell which could be a bug for the "sort"
>    command. So I hope you read this email, thank you !
>    The whole command I used was :
>    sort test.txt
>    And the result was :
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
>    20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0
>    The content in test.txt was:
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
>    20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0

Your situation is a FAQ:
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

Most likely, you are sorting in a locale that does not treat punctuation
with the same weight as digits, such as en_US.UTF8.  If you'll notice,
the substring '8202' sorts before '8225' which in turn is before '8227'
and finally '8230', once you've ignored the punctuation in '8|-|20-2',
'82|-|25', and so forth.

>    Which means the “sort” command didn't work, because I think the correct
>    result should be :
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0

Well, this isn't the right result either, as it is duplicating two lines
and missing two others (did you copy and past incorrectly?).

>    The version of "sort" command I use is : sort --version
>    "sort (GNU coreutils) 8.4

This version is rather old; we are now at 8.28.  But even as recently as
version 8.6, you can use sort's --debug feature to see where your
expectations are going wrong (as 99% of reports about sort misbehavior
turn out to instead be problems of misuse of either command line options
or current locale).  Observe the difference:

$ printf
'20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n'
| LC_ALL=en_US.UTF8 sort  --debug
sort: using ‘en_US.UTF8’ sorting rules
20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
__________________________________________
20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
_____________________________________________
20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
__________________________________________

$ printf
'20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n'
| LC_ALL=C sort  --debug
sort: using simple byte comparison
20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
_____________________________________________
20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
__________________________________________
20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
__________________________________________

And if you want the lines containing '|8|' to sort before the lines
containing '|82|', then you can't use plain sort (which is over the
whole line), but instead need to use various -k, -n, and -t options to
tell sort where the keys are separated and which keys to sort on, and
the fact that the keys should be treated as numbers rather than as
character strings (since when sorting an entire line in ASCII, digits
sort before |).

>    I am not sure if it is a bug in "sort" command in Linux Shell or maybe
>    it's only my problems in using it.

I think I've demonstrated where the problem was, so I'm closing this as
not a bug.  Feel free to reply with further questions on the topic, though.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]