[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)
From: |
Michał Janke |
Subject: |
bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort) |
Date: |
Fri, 2 Sep 2011 08:57:14 +0200 |
2011/9/2 Michał Janke <address@hidden>:
> 2011/9/1 GNU bug Tracking System <address@hidden>:
>> Your bug report
>>
>> #9418: case sensitivity buggy in sort
>>
>> which was filed against the coreutils package, has been closed.
>>
>> The explanation is attached below, along with your original report.
>> If you require more details, please reply to address@hidden
>>
>> --
>> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418
>> GNU Bug Tracking System
>> Contact address@hidden with problems
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: Eric Blake <address@hidden>
>> To: "Michał Janke" <address@hidden>
>> Date: Thu, 01 Sep 2011 10:32:45 -0600
>> Subject: Re: bug#9418: case sensitivity buggy in sort
>> tag 9418 notabug
>> thanks
>>
>> On 09/01/2011 02:58 AM, Michał Janke wrote:
>>>
>>> sort (GNU coreutils) 8.12
>>>
>>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> Thanks for the report. However, this is most likely due to your choice of
>> locale, and not a bug in sort; this is a FAQ:
>> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
>>
>> Using 'sort --debug' will help expose the issue.
>>
>>> $ sort -k1,2 bbb
>>> a B b 0
>>> A b b 1
>>> A B b 0
>>
>> $ sort --debug bbb -k1,2
>> sort: using `en_US.UTF-8' sorting rules
>> sort: leading blanks are significant in key 1; consider also specifying `b'
>> a B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> A B b 0
>> ___
>> _______
>> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
>> ../coreutils/src/sort: using simple byte comparison
>> A B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> a B b 0
>> ___
>> _______
>>
>> See the difference? In the C locale, you get ascii sorting (A comes before
>> B comes before a comes before b), in the en_US.UTF-8 locale, you get
>> dictionary collation sorting (a comes before A comes before b comes before
>> B).
>>
>> --
>> Eric Blake address@hidden +1-801-349-2682
>> Libvirt virtualization library http://libvirt.org
>>
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: "Michał Janke" <address@hidden>
>> To:address@hidden
>> Date: Thu, 1 Sep 2011 10:58:58 +0200
>> Subject: case sensitivity buggy in sort
>> sort (GNU coreutils) 8.12
>>
>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> $ cat bbb
>> A B b 0
>> a B b 0
>> A b b 1
>>
>> $ sort bbb
>> a B b 0
>> A B b 0
>> A b b 1
>>
>> $ sort -k1,2 bbb
>> a B b 0
>> A b b 1
>> A B b 0
>>
>>
>> $ cat ccc
>> A 2 b 0
>> a 2 b 0
>> A 1 b 1
>>
>> $ sort ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,2 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,1 ccc
>> a 2 b 0
>> A 1 b 1
>> A 2 b 0
>>
>>
>> $ cat ddd
>> A2 b 0
>> a2 b 0
>> A1 b 1
>>
>> $ sort ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,2 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,3 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>>
>>
>>
>
> I definitely don't agree with "locale issue" explanation. This is not
> a problem of some letter being treated as > or < than other
> - the problem is that it is _sometimes_ one way, sometimes the other!
> Please have a closer look at this one:
>
> $ cat aaa
> aa 1
> AA 1
> Aa 0
>
> Now consider what should be the output of sort in two cases: A>a and A<a.
> If A>a, the result should be
> aa 1
> Aa 0
> AA 1
>
> If A<a, it should be
> AA 1
> Aa 0
> aa 1
>
> And now the actual result:
>
> $ sort aaa
> Aa 0
> aa 1
> AA 1
>
> So the lines are sorted in first place according to the second column!
>
> But true, when locale is changed to native POSIX, the sorting is done
> reasonably
>
> $ LC_ALL=C sort aaa
> AA 1
> Aa 0
> aa 1
>
> So yes, the bug is visible only with non-standard defined locale, but
> _no_ - the results in cases of other locales are not correct.
> The capital and lower-case letters seem to just aliased.
>
If it is the _locale_ that decides on upper and lower case letters
being equal, then the bug is in locale - the results look absurd.
Where should a bugreport about locale go?