[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
decorate - new sorting-helper program (experimental)
From: |
Assaf Gordon |
Subject: |
decorate - new sorting-helper program (experimental) |
Date: |
Mon, 13 Apr 2020 13:14:55 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
Hello,
I'm happy to announce the first experimental release of the "decorate"
program.
'decorate' works in tandem with coreutils' sort(1) to allow new sorting
methods (e.g. IP addresses, roman numerals, string lengths).
This is a new program but an old idea, suggested by Pádraig here:
https://lists.gnu.org/r/bug-coreutils/2015-06/msg00076.html
---
The program is part of the "datamash" package, and available here:
https://alpha.gnu.org/gnu/datamash/datamash-1.5.17-735b.tar.gz
"./configure && make" should give you the "decorate" executable.
The rest of this (long) email shows usage information and examples.
This is an experimental version, and everything could still change.
Comments, suggestions and feedback are *very* welcomed.
regards,
- assaf
----------------------------------------------------
#### General Usage #####
The general idea is:
1. convert a field of an input file to a format that can be easily
sorted by sort(1), e.g., converting roman numerals
to their decimal equivalent or IPv4 addresses to 32 bit hex value.
2. Pass this converted (=decorated) input to sort
3. remove (=undecorate) the converted fields.
Example 1:
### convert roman-numerals, add new field
$ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman --decorate
000000000000000000100 C
000000000000000000005 V
000000000000000000003 III
000000000000000000009 IX
000000000000000000011 XI
### combine decorate-sort-undecorate
$ printf "%s\n" C V III IX XI \
| ./decorate -k1,1:roman --decorate \
| sort -k1,1 \
| ./decorate --undecorate 1
III
V
IX
XI
C
#### Easy/automatic 'decorate-sort-undecorate' method ####
Since the decorate-sort-undecorate pattern is repetitive,
the "decorate" program can execute 'decorate + sort + undecorate'
automatically (forking + piping to sort and back).
This is done when "--decorate" and "--undecorate" arguments are *not*
specified (i.e. - decorate is used as a 'sort' wrapper):
$ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman
III
V
IX
XI
C
#### Conversions Syntax #####
The -k/--key specification follows sort(1), with the addition
of allowing a conversion function name following ":" (colons).
Examples:
$ printf "MMXX III\n" | ./decorate --decorate -k1,1:roman
000000000000000002020 MMXX III
$ printf "MMXX III\n" | ./decorate --decorate -k1.2,1:roman
000000000000000001020 MMXX III
$ printf "MMXX III\n" | ./decorate --decorate -k1,1:strlen
000000000000000000004 MMXX III
$ printf "MMXX III\n" | ./decorate --decorate -k1:strlen
000000000000000000008 MMXX III
The "r" (=reverse) flag can also be used:
$ printf "%s\n" X I IV IX VI | ./decorate -k1,1:roman
I
IV
VI
IX
X
$ printf "%s\n" X I IV IX VI | ./decorate -k1,1r:roman
X
IX
VI
IV
I
Available conversions methods:
as-is copy as-is
roman roman numerals
strlen length (in bytes) of the specified field
ipv4 dotted-decimal IPv4 addresses
ipv6 IPv6 addresses
ipv4inet number-and-dots IPv4 addresses (incl. octal, hex values)
Examples:
$ printf "%s\n" 10.2.3.4 8.9.7.3 | ./decorate --decorate -k1,1:ipv4
0A020304 10.2.3.4
08090703 8.9.7.3
$ printf "%s\n" 10.010.0x10.10 192.168 \
| ./decorate --decorate -k1,1:ipv4inet
0A08100A 10.010.0x10.10
C00000A8 192.168
$ printf "%s\n" :: 2000::1234 ::ffff:192.168.1.42 \
| ./decorate --decorate -k1,1:ipv6
0000:0000:0000:0000:0000:0000:0000:0000 ::
2000:0000:0000:0000:0000:0000:0000:1234 2000::1234
0000:0000:0000:0000:0000:FFFF:C0A8:012A ::ffff:192.168.1.42
#### Mixing -k/--key for decorating and sorting ####
When 'decorate' automatically runs sort(1), any keys
that are not used for decoration are passed to 'sort'
(after being adjusted for the right column).
Example:
$ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \
| ./decorate -k1,1:roman -k2nr,2
I 107
I 19
II 4
IC 1
C 31
C 4
$ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \
| ./decorate -k2n,2 -k1,1:roman
IC 1
II 4
C 4
I 19
C 31
I 107
To better understand what parameters are passed to sort(1),
use "--print-sort-args" (which only prints the arguments to be used
with sort(1) but does not decorate or sort the input):
Here, "decorate" knows that a new field will be added
(the converted roman numerals), and so the "-k2nr,2"
is adjusted to be "-k3,3nr":
$ ./decorate --print-sort-args -k1,1:roman -k2nr,2
sort -k1,1 -k3,3nr
Here, "decorate" will add two fields (first ipv4 from field 2,
and roman numerals from field 3). The "-k5,5V" is adjusted
to be "-k7,7V":
$ ./decorate --print-sort-args -k5,5V -k2,2:ipv4 -k3,3:roman
sort -k7,7V -k1,1 -k2,2
#### Other sort(1) parameters ####
When 'decorate' automatically runs sort(1), several common sort(1)
options are accepted and passed as-is to sort.
Example:
$ ./decorate --print-sort-args -k2,2:ipv4 \
--stable \
-T /foo/bar \
-S 2G \
-t: \
--parallel 32
sort -k1,1 -s -T /foo/bar -S 2G -t : --parallel 32
The above example just prints the arguments,
but the same arguments will be sent to sort(1) if
"--print-sort-args" was not used.
#### Future improvements ####
I plan to also add a "--header" option - something that has
been requested many times for sort(1).
Since we're not worried about bloat here, and we're already
manipulating the input and output for sort as a child-process,
it will be easy to implement.
There is also a plan to add an option to specify an external program
as the conversion filter, e.g.:
-k1,1@/foo/bar/filter.sh
Which will send the keys to the script.
The argument parser supports it but the actual implementation is missing.
#### Adding conversions ####
The file 'src/decorate-functions.c' contains the built-in conversion
functions. Implementation is very simple: accepts a "const char*"
and print to STDOUT the converted/decorate representation.
It will be easy to add more conversions (assuming the conversions
rules are solid and will 'just work' with regular sort(1) alphabetic
order ).
- decorate - new sorting-helper program (experimental),
Assaf Gordon <=