[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)
From: |
Michał Janke |
Subject: |
bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort) |
Date: |
Fri, 2 Sep 2011 08:46:23 +0200 |
2011/9/1 GNU bug Tracking System <address@hidden>:
> Your bug report
>
> #9418: case sensitivity buggy in sort
>
> which was filed against the coreutils package, has been closed.
>
> The explanation is attached below, along with your original report.
> If you require more details, please reply to address@hidden
>
> --
> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418
> GNU Bug Tracking System
> Contact address@hidden with problems
>
>
> ---------- Wiadomość przekazana dalej ----------
> From: Eric Blake <address@hidden>
> To: "Michał Janke" <address@hidden>
> Date: Thu, 01 Sep 2011 10:32:45 -0600
> Subject: Re: bug#9418: case sensitivity buggy in sort
> tag 9418 notabug
> thanks
>
> On 09/01/2011 02:58 AM, Michał Janke wrote:
>>
>> sort (GNU coreutils) 8.12
>>
>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>
> Thanks for the report. However, this is most likely due to your choice of
> locale, and not a bug in sort; this is a FAQ:
> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
>
> Using 'sort --debug' will help expose the issue.
>
>> $ sort -k1,2 bbb
>> a B b 0
>> A b b 1
>> A B b 0
>
> $ sort --debug bbb -k1,2
> sort: using `en_US.UTF-8' sorting rules
> sort: leading blanks are significant in key 1; consider also specifying `b'
> a B b 0
> ___
> _______
> A b b 1
> ___
> _______
> A B b 0
> ___
> _______
> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
> ../coreutils/src/sort: using simple byte comparison
> A B b 0
> ___
> _______
> A b b 1
> ___
> _______
> a B b 0
> ___
> _______
>
> See the difference? In the C locale, you get ascii sorting (A comes before B
> comes before a comes before b), in the en_US.UTF-8 locale, you get dictionary
> collation sorting (a comes before A comes before b comes before B).
>
> --
> Eric Blake address@hidden +1-801-349-2682
> Libvirt virtualization library http://libvirt.org
>
>
>
> ---------- Wiadomość przekazana dalej ----------
> From: "Michał Janke" <address@hidden>
> To:address@hidden
> Date: Thu, 1 Sep 2011 10:58:58 +0200
> Subject: case sensitivity buggy in sort
> sort (GNU coreutils) 8.12
>
> The case-sensitivity looks buggy in sort. Have a look at these examples:
>
> $ cat bbb
> A B b 0
> a B b 0
> A b b 1
>
> $ sort bbb
> a B b 0
> A B b 0
> A b b 1
>
> $ sort -k1,2 bbb
> a B b 0
> A b b 1
> A B b 0
>
>
> $ cat ccc
> A 2 b 0
> a 2 b 0
> A 1 b 1
>
> $ sort ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1 ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1,2 ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1,1 ccc
> a 2 b 0
> A 1 b 1
> A 2 b 0
>
>
> $ cat ddd
> A2 b 0
> a2 b 0
> A1 b 1
>
> $ sort ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,1 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,2 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,3 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
>
>
>
I definitely don't agree with "locale issue" explanation. This is not
a problem of some letter being treated as > or < than other
- the problem is that it is _sometimes_ one way, sometimes the other!
Please have a closer look at this one:
$ cat aaa
aa 1
AA 1
Aa 0
Now consider what should be the output of sort in two cases: A>a and A<a.
If A>a, the result should be
aa 1
Aa 0
AA 1
If A<a, it should be
AA 1
Aa 0
aa 1
And now the actual result:
$ sort aaa
Aa 0
aa 1
AA 1
So the lines are sorted in first place according to the second column!
But true, when locale is changed to native POSIX, the sorting is done reasonably
$ LC_ALL=C sort aaa
AA 1
Aa 0
aa 1
So yes, the bug is visible only with non-standard defined locale, but
_no_ - the results in cases of other locales are not correct.
The capital and lower-case letters seem to just aliased.