[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#7961: sort
From: |
Eric Blake |
Subject: |
bug#7961: sort |
Date: |
Wed, 02 Feb 2011 10:44:00 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 |
On 02/02/2011 05:42 AM, Francesco Bettella wrote:
> hi,
> I may have bumped into an undesired feature/bug of sort, which appears to be
> still present in the version 8.9 of coreutils.
Thanks for the report. However, this is a feature, and not a bug, of sort.
>
> I'm issuing the following sort commands (see attached files):
>
> [prompt1] > sort -k 1.4,1n asd1 > asd1.sorted
>
> [prompt2] > sort -k 2.4,2n asd2 > asd2.sorted
If I'm correct, asd1 and asd2 have the same contents, except that you
have swapped columns 1 and 2 between the two and resorted the lines.
And your desired goal is that the output matches asd1.sorted, again with
the columns swapped for asd2.sorted.
>
> the first one works as I would expect, the second one doesn't.
Let's examine why:
$ head -3 asd1 | sort -k 1.4,1n --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
chr>coding_gene
^ no match for key
_______________
chr1>PRAMEF1
_
____________
chr1>PRAMEF4
_
____________
$ head -3 asd1 | LC_ALL=C sort -k 1.4,1n --debug
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying `b'
chr>coding_gene
^ no match for key
_______________
chr1>PRAMEF1
_
____________
chr1>PRAMEF4
_
____________
In both cases, when there is no match for a key but numeric sorting was
requested, then that line sorts first; meanwhile, you get the fallback
sort of the complete line after the first key has been sorted, so that
the end result matches asd1.sorted whether you use the C locale or
dictionary sorting.
But notice that warning about not using -b, and how it affects asd2 (and
also, how the difference in dictionary vs. byte-ordering plays a role in
the secondary sorting):
$ head -3 asd2 | sort -k 2.4,2n --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
coding_gene>chr
^ no match for key
_______________
PRAMEF1>chr1
^ no match for key
____________
PRAMEF4>chr1
^ no match for key
____________
$ head -3 asd2 | LC_ALL=C sort -k 2.4,2n --debug
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying `b'
PRAMEF1>chr1
^ no match for key
____________
PRAMEF4>chr1
^ no match for key
____________
coding_gene>chr
^ no match for key
But when you add -b (note, b is the one option you have to add to the
start field, since it affects start and end fields specially; all other
options can be added to start, end, or both, and affect the entire key):
$ head -3 asd2 | sort -k 2.4b,2n --debug
sort: using `en_US.UTF-8' sorting rules
coding_gene>chr
^ no match for key
_______________
PRAMEF1>chr1
_
____________
PRAMEF4>chr1
_
____________
$ head -3 asd2 | LC_ALL=C coreutils/src/sort -k 2.4b,2n --debug
coreutils/src/sort: using simple byte comparison
coding_gene>chr
^ no match for key
_______________
PRAMEF1>chr1
_
____________
PRAMEF4>chr1
_
____________
That is, your expectations were insufficient - without telling sort
enough additional information, sort correctly followed what you told it
to do, but what you told it was not what you meant. And the --debug
option is your [new] friend :)
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature