[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33371: RFC: option for numeric sort: ignore-non-numeric characters
From: |
Erik Auerswald |
Subject: |
bug#33371: RFC: option for numeric sort: ignore-non-numeric characters |
Date: |
Mon, 19 Nov 2018 15:27:00 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 |
Hi,
On 11/19/18 02:08, L A Walsh wrote:
On 11/14/2018 12:27 AM, Erik Auerswald wrote:
On Tue, Nov 13, 2018 at 06:32:55PM -0800, L A Walsh wrote:
I have a bunch of files numbered from 1-over 2000 without leading zeros
(think rfc's)...
They have names with a non-numeric prefix & suffix around the number.
Are prefix and suffix constant? RFC files are usually named rfc${NR}.txt.
It would be nice if sort had the option to ignore non-numeric
data and only sort on the numeric data in the 'lines'/'files'.
Perhaps --version-sort could work for you?
[...]
the 'sort -V' works by itself, works.
[...]
Or is there an options for this already, and my manpage out of date?
AFAIK not exactly.
[...]
"-V" seems like it might be sufficient, but I doubt most
non-computer types would know that -V would sort multiple numeric fields
separated by invariant non-numeric characters in a numeric fashion
(or would even know how a version sort is the other sorts).
As far as I remember, the definition of --version-sort is to follow the
Debian GNU/Linux package version sorting rules. Those are based on
numbers surrounded by text, but several characters have special meaning
(e.g. '~' sorts before everything else, even before the empty string).
Thus this is _not_ a "natural sort," but quite specific and potentially
surprising.
$ printf -- 'foo\nbar\nfoo-bar\nfoo~bar\n' | sort --version-sort
bar
foo~bar
foo
foo-bar
Given how well read docs are these days, almost need a literal definition
of 'version sort' besides just calling it a 'version sort' (which we
must admit, is 'jargon').
I think is worse than jargon, because it is specific to one kind of
version numbering scheme.
Along the lines of:
--version-sort | -V Sees inputs as a mix of numeric and
alphabetic (or "identifier")
fields, where the numeric fields are sorted naturally, and alpha
fields sorted alphabetically. Fields may have separators like
'.', '_', or '-', sometimes constrained by a specific computer
language, or may have no separator at all between numeric and
alpha fields. This is type of sort is often called a
"version sort" in the computer field.
Thus I am not sure about your suggestion above. :-/
??? I listed 'version sort' at the end, as the equivalence so those who
tend
to skip and read initial parts of lines/paragraphs would not just see
"version sort" and gloss over the rest, inserting their own equivalence
for the definition -- especially likely w/"version-sort" being the long
form
of the switch.
I like that strategy. :-)
Thanks,
Erik