On 02/17/2011 01:46 PM, Bob Harris wrote:
Howdy,
(note: I know I should give you version information with this, but
(1) I
am not sure that this message will be read by anyone, and (2) I think
the problem probably transcends versions. If I get a response and
the
actual version is important, I will take the time to find it.)
Thanks for the report, and you are correct that your issue transcends
versions. However, if you use coreutils 8.6 or newer (the latest is
8.10), then the new --debug option would have helped you.
I have a file of genomic short sequence info in which it so happens
that
two of my sort key values are similar. The two keys are
HWI-ST407_110127_0082_A80L25ABXX:5:2:11746:46371#0/1
HWI-ST407_110127_0082_A80L25ABXX:5:21:17464:6371#0/1
As you can see, these are identical if one removes the colons.
Which sounds like exactly what sort does when you are sorting in the
en_US.UTF-8 locale.
I have tried several different options but none seem to work. -d
seems
to be the default, and it has the behavior indicated above. -n fails
completely. -g also fails. Reading the man page, I don't see any
other
options to control the comparison function.
Then you missed this part (in the sort man page, which is in turn
generated from 'sort --help'):
*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.
I understand *why* -d considers these two keys equal. What I don't
understand is why there is no option that says "order them
lexicographically".
That option is your set of locale-specific environment variables. Why
it's not an explicit option is due to historical accident (that's the
way POSIX specified it). Maybe GNU sort should add a
--collate-locale=... option as an extension that overrides LC_ALL, but
that seems a bit like bloat, and doesn't buy much over using the
standardized means of choosing collation sequencing.
Is there a hidden sort option that will do what I need?
Yep - try 'LC_ALL=C sort ...' to see the difference.
I'm pretty sure I'm not the first person to run into this problem.
You're not. It's a FAQ:
http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org