bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33371: RFC: option for numeric sort: ignore-non-numeric characters


From: L A Walsh
Subject: bug#33371: RFC: option for numeric sort: ignore-non-numeric characters
Date: Wed, 14 Nov 2018 22:24:58 -0800
User-agent: Thunderbird



On 11/13/2018 6:44 PM, Eric Blake wrote:
On 11/13/18 8:32 PM, L A Walsh wrote:
I have a bunch of files numbered from 1-over 2000 without leading zeros
(think rfc's)...
They have names with a non-numeric prefix & suffix around the number.

It would be nice if sort had the option to ignore non-numeric
data and only sort on the numeric data in the 'lines'/'files'.

Yeah, I can renumber and rename them all, but I just wanted
an instant command that could sort numeric values even if embedded
in a line, where the "field" was determined by the start/stop of
numeric characters.

Or is there an options for this already, and my manpage out of date?

Without ACTUAL data to experiment with, it's much harder for anyone else to propose a solution that will work with your specific data.
----
...think rfcs...um have you ever looked at the directory with a bunch (all or most) rfc in it?



But one quick approach comes to mind: decorate-sort-undecorate:

sed 's/^\([^0-9]*\)\([0-9]*\)/\2 \1\2/' < myinput \
   | sort -k1,1n | sed 's/^[0-9]* //' > myoutput

----
        That does work, but still seems a bit odd on a numeric
sort not to have it, even by default, ignore non-numeric data in front or after.

        I may be imagining this, but I though I'd seen some version of sort
that did this -- simply skipping the non numeric characters and sorting on the
numbers.

        Instead this sort reverted to alpha sort.  Thinking about
it...if I ask for numeric sort, shouldn't it at least try to look for
numbers in each line to sort them?

        That seems like it might be a user-friendly and even consistent
thing to do, considering there are options to
1) ignore leading blanks
2) ignore case
3) ignore nonprinting... ( this most close parallels the request, since when when doing an alpha sort, one might hope it could ignore what isn't visible).
4) "human sort" --- actually this option sorta makes it look like a
bug, since this sort ignores things that don't look like a number+suffix).
So why wouldn't numeric sort do the same?

I'd even sorta hoped the -h sort might work for this... since
if you were showing sizes, and only had values in 'bytes', you wouldn't see
the suffixes. So I'd hoped that it would order rfc98.txt before rfc979.txt, but such is not the case.

I.e. in the case of 'ls', it ignores junk before and after the numbers+optional unit). So one might wonder why it doesn't properly sort the numbers with 'rfc' before them and '.txt' after them. I.e. should 4 have worked maybe? Might be a bit perverse, but can't see why not.



        






reply via email to

[Prev in Thread] Current Thread [Next in Thread]