bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33371: RFC: option for numeric sort: ignore-non-numeric characters


From: L A Walsh
Subject: bug#33371: RFC: option for numeric sort: ignore-non-numeric characters
Date: Sun, 18 Nov 2018 17:08:00 -0800
User-agent: Thunderbird



On 11/14/2018 12:27 AM, Erik Auerswald wrote:
Hi,

On Tue, Nov 13, 2018 at 06:32:55PM -0800, L A Walsh wrote:
I have a bunch of files numbered from 1-over 2000 without leading zeros
(think rfc's)...
They have names with a non-numeric prefix & suffix around the number.

Are prefix and suffix constant? RFC files are usually named rfc${NR}.txt.

It would be nice if sort had the option to ignore non-numeric
data and only sort on the numeric data in the 'lines'/'files'.

Perhaps --version-sort could work for you?

$ for r in rfc{1..100}.txt; do echo "$r"; done | sort | sort -V

(The first sort un-sorts the sorted input data, the seconds sorts it
again.)
-----
        Tried this... had initial "turn-off" with using a for loop to
list files when '/bin/ls -1 *.txt' was so much shorter.  However, just
the 'sort -V' works by itself, works.

I'm not sure exactly why, but that wasn't initially clear to
me, though, maybe should have been, having written version-sort
more than once before.
Minor gotchas, using single numbers, the for loop produced:
rfc1.txt
rfc2.txt
rfc3.txt
rfc4.txt
rfc5.txt
rfc6.txt
rfc7.txt
rfc8.txt
rfc9.txt

while the '/bin/ls -1 rfc?.txt|sort -V' algorithm produced:
rfc1.txt
rfc2.txt
rfc3.txt
rfc4.txt
rfc5.txt
rfc6.txt
----
(7-9 didn't exist in the directory)

[...]
Or is there an options for this already, and my manpage out of date?

AFAIK not exactly.

Thanks,
Erik
----
        "-V" seems like it might be sufficient, but I doubt most
non-computer types would know that -V would sort multiple numeric fields
separated by invariant non-numeric characters in a numeric fashion
(or would even know how a version sort is the other sorts).

Given how well read docs are these days, almost need a literal definition
of 'version sort' besides just calling it a 'version sort' (which we
must admit, is 'jargon'). Along the lines of:

--version-sort | -V Sees inputs as a mix of numeric and alphabetic (or "identifier")
      fields, where the numeric fields are sorted naturally, and alpha
      fields sorted alphabetically.  Fields may have separators like
      '.', '_', or '-',  sometimes constrained by a specific computer
      language, or may have no separator at all between numeric and
alpha fields. This is type of sort is often called a "version sort" in the computer field.

???  I listed 'version sort' at the end, as the equivalence so those who tend
to skip and read initial parts of lines/paragraphs would not just see "version sort" and gloss over the rest, inserting their own equivalence
for the definition -- especially likely w/"version-sort" being the long form
of the switch.











reply via email to

[Prev in Thread] Current Thread [Next in Thread]