[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: possible bug in sort
From: |
Bob Proulx |
Subject: |
Re: possible bug in sort |
Date: |
Fri, 8 Dec 2006 22:27:46 -0700 |
User-agent: |
Mutt/1.5.9i |
John Novatnack wrote:
> I ran across strange behavior of the Unix sort command.
Thanks for the bug report. But as you know GNU is Not Unix. :-)
What version of GNU sort are you using?
sort --version
What is your locale setting?
locale
> But now see what happens when I add a trailing zero.
>
> $ sort -n
> 0.1 2
> 0.1 3
> 0.1 1
> 0.10 2
>
> 0.10 2
> 0.1 1
> 0.1 2
> 0.1 3
I cannot recreate your problem on my Debian system using either the
stock sort or the latest cvs sort. However my eye spots some problems
with your use. The documentation says this:
Numeric sort uses what might be considered an unconventional
method to compare strings representing floating point numbers.
Rather than first converting each string to the C `double' type
and then comparing those values, `sort' aligns the decimal-point
characters in the two strings and compares the strings a character
at a time. One benefit of using this approach is its speed. In
practice this is much more efficient than performing the two
corresponding string-to-double (or even string-to-integer)
conversions and then comparing doubles. In addition, there is no
corresponding loss of precision. Converting each string to
`double' before comparison would limit precision to about 16
digits on most systems.
Also:
A pair of lines is compared as follows: if any key fields have been
specified, `sort' compares each pair of fields, in the order specified
on the command line, according to the associated ordering options,
until a difference is found or no fields are left. Unless otherwise
specified, all comparisons use the character collating sequence
specified by the `LC_COLLATE' locale. (1)
And importantly:
For the large majority of applications, treating keys spanning
more than one field as numeric will not do what you expect.
Therefore you should specify field options. Since you are apparently
wanting to sort first on the first field and then second on the second
field then this is really what you want.
Try this:
printf "0.1 2\n0.1 3\n0.1 1\n0.10 2\n" | sort -k 1,1n -k 2,2n
0.1 1
0.1 2
0.10 2
0.1 3
Don't miss the note about sort respecting your current locale
settings.
(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to. In that case, set the `LC_ALL' environment
variable to `C'. Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set. Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value. For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.
Bob