[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort --stable (-s) doesn't appear to work on my system
From: |
Eric Blake |
Subject: |
Re: sort --stable (-s) doesn't appear to work on my system |
Date: |
Tue, 8 Dec 2015 15:25:39 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 12/08/2015 02:26 PM, Terry Farrah wrote:
> I have a tab-separated file that I think is already sorted on the first 3
> columns. Here is a 2-line sample in a file named foo:
>
> chr10 60379 60380 10:60380-60380 T/T
> chr10 60379 60380 10:60380-60380 G/T
>
> I try checking it with
>
> sort -s -k1,1V -k2,2n -k3,3n -c foo
>
> but the check fails:
>
> sort: foo:2: disorder: chr10 60379 60380 10:60380-60380 G/T
>
> If I sort it using the above key specification, it swaps the order of the
> lines:
>
> sort -s -k1,1V -k2,2n -k3,3n foo
>
> chr10 60379 60380 10:60380-60380 G/T
> chr10 60379 60380 10:60380-60380 T/T
Doesn't reproduce for me with Fedora's coreutils-8.23-11.fc22.x86_64:
$ printf
'chr10\t60379\t60380\t10:60380-60380\tT/T\nchr10\t60379\t60380\t10:60380-60380\tG/T\n'
| sort -s -k1,1V -k2,2n -k3,3n
chr10 60379 60380 10:60380-60380 T/T
chr10 60379 60380 10:60380-60380 G/T
> $ sort -s -k1,1V -k2,2n -k3,3n --debug foo
> sort: using ‘en_US.UTF-8’ sorting rules
> sort: leading blanks are significant in key 1; consider also specifying 'b'
> chr10>60379>60380>10:60380-60380>G/T
Awesome! Most bug reports fail to provide this important piece of
information.
You may want to follow the advice there of adding 'b' (as in -k1b,1V);
but as far as I can tell, it shouldn't be affecting the behavior you are
seeing (since your sample file didn't have leading whitespace).
> $ sort --version
> sort (GNU coreutils) 8.22
> $ more /etc/*-release
> ::::::::::::::
> /etc/oracle-release
> ::::::::::::::
> Oracle Linux Server release 7.1
> $ uname -r
> 3.8.13-68.1.2.el7uek.x86_64
I suspect that the most-likely culprit is a downstream vendor bug (it is
not the first time that vendor I18N patches have caused sort to
misbehave, where upstream is just fine). For example,
https://bugzilla.redhat.com/show_bug.cgi?id=1148347
says that some builds of RHEL 7 coreutils 8.22 had a broken I18N patch
that calls strcoll() on too much of the subject line. That would
certainly explain why your build seems affected, if the suffix 'G/T' vs.
'T/T' is being treated as significant, especially since you proved you
are using en_US.UTF-8 (and not LC_ALL=C).
But that's all the more I can point to - at this point, you'll have to
take it up with Oracle.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature