[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33371: RFC: option for numeric sort: ignore-non-numeric characters
From: |
L A Walsh |
Subject: |
bug#33371: RFC: option for numeric sort: ignore-non-numeric characters |
Date: |
Sun, 18 Nov 2018 17:08:00 -0800 |
User-agent: |
Thunderbird |
On 11/14/2018 12:27 AM, Erik Auerswald wrote:
Hi,
On Tue, Nov 13, 2018 at 06:32:55PM -0800, L A Walsh wrote:
I have a bunch of files numbered from 1-over 2000 without leading zeros
(think rfc's)...
They have names with a non-numeric prefix & suffix around the number.
Are prefix and suffix constant? RFC files are usually named rfc${NR}.txt.
It would be nice if sort had the option to ignore non-numeric
data and only sort on the numeric data in the 'lines'/'files'.
Perhaps --version-sort could work for you?
$ for r in rfc{1..100}.txt; do echo "$r"; done | sort | sort -V
(The first sort un-sorts the sorted input data, the seconds sorts it
again.)
-----
Tried this... had initial "turn-off" with using a for loop to
list files when '/bin/ls -1 *.txt' was so much shorter. However, just
the 'sort -V' works by itself, works.
I'm not sure exactly why, but that wasn't initially clear to
me, though, maybe should have been, having written version-sort
more than once before.
Minor gotchas, using single numbers, the for loop produced:
rfc1.txt
rfc2.txt
rfc3.txt
rfc4.txt
rfc5.txt
rfc6.txt
rfc7.txt
rfc8.txt
rfc9.txt
while the '/bin/ls -1 rfc?.txt|sort -V' algorithm produced:
rfc1.txt
rfc2.txt
rfc3.txt
rfc4.txt
rfc5.txt
rfc6.txt
----
(7-9 didn't exist in the directory)
[...]
Or is there an options for this already, and my manpage out of date?
AFAIK not exactly.
Thanks,
Erik
----
"-V" seems like it might be sufficient, but I doubt most
non-computer types would know that -V would sort multiple numeric fields
separated by invariant non-numeric characters in a numeric fashion
(or would even know how a version sort is the other sorts).
Given how well read docs are these days, almost need a literal definition
of 'version sort' besides just calling it a 'version sort' (which we
must admit, is 'jargon'). Along the lines of:
--version-sort | -V
Sees inputs as a mix of numeric and alphabetic (or "identifier")
fields, where the numeric fields are sorted naturally, and alpha
fields sorted alphabetically. Fields may have separators like
'.', '_', or '-', sometimes constrained by a specific computer
language, or may have no separator at all between numeric and
alpha fields. This is type of sort is often called a
"version sort" in the computer field.
??? I listed 'version sort' at the end, as the equivalence so those who tend
to skip and read initial parts of lines/paragraphs would not just see
"version sort" and gloss over the rest, inserting their own equivalence
for the definition -- especially likely w/"version-sort" being the long form
of the switch.