[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#17188: Sort bugs
From: |
Nikos Balkanas |
Subject: |
bug#17188: Sort bugs |
Date: |
Sat, 5 Apr 2014 21:42:31 +0300 |
On Sat, Apr 5, 2014 at 3:21 PM, Eric Blake <address@hidden> wrote:
> tag 17188 notabug
> thanks
>
> On 04/04/2014 08:07 PM, Nikos Balkanas wrote:
> > Hi,
> >
> > Sort is seriously bugged. This is the output from:
> >
> > sort -d -t \t -k1 input > out
>
> -d says to do a dictionary sort that ignores non-alphanumeric
> characters. But it still leaves it up to your current locale on whether
> those non-alpha characters are collated case-insensitively.
>
> Also, '-k1' is almost always wrong - you generally want '-k1,1' if you
> want to sort by JUST the first field, rather than by the whole line.
>
Sorting by the first line? What is that? Sort should work on each line by
given columns
Unix man:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F
is a field number and C a character position in the field; both are ori‐
gin 1, and the stop position defaults to the line's end...
In retrospect this confirms your saying. However, on first look, it doesn't
make sense. An example
like the one you gave me, in the man page would save a lot of explaining.
> See the FAQ:
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>
>
From that link:
"So far there is still no fully satisfactory solution to this problem. If
you find one then please contact me so that this information can be listed."
If you are "me", then I would like to suggest that you make default the
legacy sort behaviour, and add with -c the locale support
that standards and non-English users ask for.
>
> > 0009rN2S3cohd2DGH6yuTWBoeuq6DwWZhCBDEnFzYqpw984FfALy7NUhEZH1.YEbiq/
> > 000EMQeKUjtyXIOaUkT.XE6SaBIdOqTA0nffF394V6tkcVdup2c3ihi7yhbuRof2Y5agTG
> > 000p8kXIz5Tc1GaxYYXjAfgm7YJOZvyBJxVXMi0lhaJXT22IdDbE6vVhWXW9FkRBxQ
> > 00/0QwzaXrqGHXW7mE9Le8IIVgHoZvccgGydKdzJgh8.SZenbULmIWMtrGShz24W7T
> > 000R2cnZ8.khe1eXDERclkbXASRQeKvcNBaCJRLX617Xvmff0KaoZSSFBNhNG1OiIyr
> >
> > Shouldn't 00/0 be first according to Ascii code?
>
I have to sort billions of these hashes in TB sizes of files. These are
followed by a <TAB> and a key (therefore the -t \t).
I was considering downloading legacy sort sources and compiling for my
system. Or taking recent sources and fixing the source.
Both dreadful aspects, because it would make my system incompatible and
inconsistent. You don't know how happy you make me,
that i can still get legacy behaviour out of the modern releases.
There's nothing to fix but your usage pattern. So I'm closing this as
> not a bug. But feel free to reply further if you still have questions.
>
UI is still a bug, though not a code bug. And legacy UI compatibility is
broken. However, I am perfectly satisfied with your fast and long
explanation of what the status is.
You will, however, go crazy if you respond like that to every user with a
locale sorting issue. Can't you make default LOCALE=C for sorting and
allow users to change that
to the system settings using -c when they need it? Nowadays users use other
graphical tools to do sorting, sort is used mostly by scripts.
Thank you,
Nikos
>
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>