bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19021: closed (Re: bug#19021: Possible bug in sort)


From: Ben Mendis
Subject: bug#19021: closed (Re: bug#19021: Possible bug in sort)
Date: Tue, 11 Nov 2014 15:07:27 -0500

Thanks for the explanation. This solves my issue.

On Tue, Nov 11, 2014 at 12:40 PM, GNU bug Tracking System <
address@hidden> wrote:

> Your bug report
>
> #19021: Possible bug in sort
>
> which was filed against the coreutils package, has been closed.
>
> The explanation is attached below, along with your original report.
> If you require more details, please reply to address@hidden
>
> --
> 19021: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19021
> GNU Bug Tracking System
> Contact address@hidden with problems
>
>
> ---------- Forwarded message ----------
> From: Eric Blake <address@hidden>
> To: Ben Mendis <address@hidden>, address@hidden
> Cc:
> Date: Tue, 11 Nov 2014 10:39:13 -0700
> Subject: Re: bug#19021: Possible bug in sort
> tag 19021 notabug
> thanks
>
> On 11/11/2014 09:39 AM, Ben Mendis wrote:
> >
> http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-field-incorrectly-based-on-the-presence-or-absenc
> >
> > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3
>
> Thanks for the report.  Rather than making us chase down links, why not
> provide the information inline with your email?
>
> >
> > This results in line 7 being sorted incorrectly: sort -t , -k 1n <
> weird.csv
>
> Try using the --debug option to see what is really happening.  The bug
> is NOT in sort (which correctly obeyed your locale rules and incorrect
> command line), but in your command line (because you didn't tell sort
> where to quit parsing numbers).
>
> I'm going to distill it down to a smaller input that still expresses the
> same "swapped" lines:
>
> $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
>  | sort -t, -k1n --debug
> sort: using ‘en_US.UTF-8’ sorting rules
> sort: key 1 is numeric and spans multiple fields
> 1,73,67,6
> _________
> _________
> 2,68,61,7
> _________
> _________
> 1,69,55,14
> __________
> __________
> 2,71,59,12
> __________
> __________
>
> See what's happening? The -k1n argument says to start parsing at field
> 1, but continue parsing until either the input is no longer numeric or
> until the end of line is reached (even if it goes into field 2 or
> beyond). Since commas are silently ignored in the en_US.UTF-8 locale
> when parsing a number, sort is thus comparing the values 268617 and
> 1695514, and the sort was correct.
>
> Now, try telling sort that it must parse a numeric field, but must END
> the parse at the end of the first field (if not sooner due to end of
> number):
>
> $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
>  | sort -t, -k1,1n --debug
> sort: using ‘en_US.UTF-8’ sorting rules
> 1,69,55,14
> _
> __________
> 1,73,67,6
> _
> _________
> 2,68,61,7
> _
> _________
> 2,71,59,12
> _
> __________
>
> Or try using a locale where ',' is NOT part of a valid number:
>
> $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
>  | LC_ALL=C sort -t, -k1n --debug
> sort: using simple byte comparison
> sort: key 1 is numeric and spans multiple fields
> 1,69,55,14
> _
> __________
> 1,73,67,6
> _
> _________
> 2,68,61,7
> _
> _________
> 2,71,59,12
> _
> __________
>
>
> >
> > This produced the expected results: cut -f , -d 1-3 < weird.csv | sort
> -t ,
> > -k 1n
>
> Actually, you mean 'cut -d, -f 1-3' (you transposed while transferring
> from the stackoverflow site to your email).  But yeah, when you truncate
> to a smaller number, you are comparing different values (17367 is less
> than 26861).
>
> >
> > Using 'g' instead of 'n' also produces the expected results, but I'm not
> > clear on what the difference is between 'g' and 'n'.
>
> -n is specified by POSIX as parsing integers according to the current
> locale's definition.  -g is a GNU extension, which says to parse
> floating point numbers.  Apparently, in the en_US.UTF-8 locale, parsing
> floating point stops at the first comma, while parsing integers does not:
>
> $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
>  | sort -t, -k1g --debug
> sort: using ‘en_US.UTF-8’ sorting rules
> sort: key 1 is numeric and spans multiple fields
> 1,69,55,14
> _
> __________
> 1,73,67,6
> _
> _________
> 2,68,61,7
> _
> _________
> 2,71,59,12
> _
> __________
>
> I don't know why libc chose to make strtoll() ignore commas while
> strtold() does not, when not in the C locale.
>
> But at any rate, I hope I've demonstrated that the bug was in your usage
> and not in sort.  So I'm closing this bug, although you should feel free
> to add further comments or questions.  You may also want to read the FAQ:
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
> [Hmm - we should update that FAQ to mention the --debug option]
>
> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
>
> ---------- Forwarded message ----------
> From: Ben Mendis <address@hidden>
> To: address@hidden
> Cc:
> Date: Tue, 11 Nov 2014 11:39:12 -0500
> Subject: Possible bug in sort
>
> http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-field-incorrectly-based-on-the-presence-or-absenc
>
> Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3
>
> This results in line 7 being sorted incorrectly: sort -t , -k 1n <
> weird.csv
>
> This produced the expected results: cut -f , -d 1-3 < weird.csv | sort -t
> , -k 1n
>
> Using 'g' instead of 'n' also produces the expected results, but I'm not
> clear on what the difference is between 'g' and 'n'.
>
> Tested with sort 8.21 on Slackware64-current.
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]