emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#18540: closed (Sorting bug?)


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#18540: closed (Sorting bug?)
Date: Tue, 23 Sep 2014 21:37:02 +0000

Your message dated Tue, 23 Sep 2014 15:36:53 -0600
with message-id <address@hidden>
and subject line Re: bug#18540: Sorting bug?
has caused the debbugs.gnu.org bug report #18540,
regarding Sorting bug?
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
18540: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18540
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: Sorting bug? Date: Tue, 23 Sep 2014 22:24:45 +0200
I discovered a behaviour of "sort" that looks like a bug to me.  When
one key in the input is an initial part of another key, the shorter
key is sorted first if the key is all there is on the line.  But if
there are other fields too, not included in the key, the order
changes.  That is true even with the --stable flag, so "sort" seems to
consider the order of the keys different in the two cases.

I sort in a non-C locale.  sv_SE.utf8 actually, but en_US.utf8 behaves
the same so I illustrate using that.

First case, the key is all there is on the line.  The shorter line
gets sorted earlier, regardless of input order:

    address@hidden Hämtat]$ { echo 'binutils x86_64'; echo 
'binutils-x86_64-linux-gnu x86_64'; } | LANG=en_US.utf8 sort --stable --debug 
--key=1,1 --field-separator=!
    sort: using ‘en_US.utf8’ sorting rules
    binutils x86_64
    _______________
    binutils-x86_64-linux-gnu x86_64
    ________________________________
    address@hidden Hämtat]$ { echo 'binutils-x86_64-linux-gnu x86_64'; echo 
'binutils x86_64'; } | LANG=en_US.utf8 sort --stable --debug --key=1,1 
--field-separator=!
    sort: using ‘en_US.utf8’ sorting rules
    binutils x86_64
    _______________
    binutils-x86_64-linux-gnu x86_64
    ________________________________



Second case, the input lines contains a second field.  Now the longer
field gets sorted earlier, regardless of input order:

    address@hidden Hämtat]$ { echo 'binutils x86_64!new'; echo 
'binutils-x86_64-linux-gnu x86_64!new'; } | LANG=en_US.utf8 sort --stable 
--debug --key=1,1 --field-separator=!
    sort: using ‘en_US.utf8’ sorting rules
    binutils-x86_64-linux-gnu x86_64!new
    ________________________________
    binutils x86_64!new
    _______________
    address@hidden Hämtat]$ { echo 'binutils-x86_64-linux-gnu x86_64!new'; echo 
'binutils x86_64!new'; } | LANG=en_US.utf8 sort --stable --debug --key=1,1 
--field-separator=!
    sort: using ‘en_US.utf8’ sorting rules
    binutils-x86_64-linux-gnu x86_64!new
    ________________________________
    binutils x86_64!new
    _______________


I can't see any reason for this.  Is it me not understanding sorting,
or is it actually a bug?



--- End Message ---
--- Begin Message --- Subject: Re: bug#18540: Sorting bug? Date: Tue, 23 Sep 2014 15:36:53 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0
tag 18540 notabug
thanks

On 09/23/2014 02:58 PM, Eric Blake wrote:

> Let's look further:
> 

> $ printf 'a b!x\na-b-c!x\n' | LANG=en_US.utf8 ltrace -e strcoll sort -s
> --debug -k1,1 -t!
> sort: using ‘en_US.utf8’ sorting rules
> sort->strcoll("a b!x", "a-b-c!x")                = 21
> a-b-c!x
> _____
> a b!x
> ___
> +++ exited (status 0) +++

Hmm, I just noticed something.

> 
> 
> Huh? Why are we passing the ENTIRE line to strcoll?  Shouldn't we only
> be passing the key?

That was my distro's build of sort (in my case, Fedora 20, with sort
from GNU coreutils 8.21).  But looking at coreutils.git (v8.23-39-g1ff4d08),

$ printf 'a b!x\na-b-c!x\n' | LANG=en_US.utf8 ltrace -e strcoll
./src/sort -s --debug -k1,1 -t!
./src/sort: using ‘en_US.utf8’ sorting rules
sort->strcoll("a b", "a-b-c")                    = -1
a b!x
___
a-b-c!x
_____
+++ exited (status 0) +++

Yay - strcoll now uses the correct bounds.  Next step - determining if
this is an upstream problem that was fixed in the interim, or if this is
a bug in the downstream additions on top of stock upstream.  None of the
9 commits in 'git shortlog v8.21.. src/sort.c' seem to describe the
situation.

And looking at my distro's patches, there is definitely some gorp added
to sort.c in coreutils-i18n.patch, which I highly suspect to be the root
cause.

So please re-raise this as a downstream bug in your distro's i18n patch,
as upstream coreutils is immune.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]