>From a8ae1f29a96b47b9a9c2b26875bd41bfa124e83b Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Sun, 30 Dec 2018 12:21:31 -0700 Subject: [PATCH] doc: add examples of shuf/sort -R Requested by Dan Jacobson in https://bugs.gnu.org/33025 . * doc/coreutils.texi (randomizing files): New section. --- doc/coreutils.texi | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 8d303cd56..e05b34ab1 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -276,6 +276,7 @@ Operating on sorted files * comm invocation:: Compare two sorted files line by line * ptx invocation:: Produce a permuted index of file contents * tsort invocation:: Topological sort +* randomizing files:: Producing random output @command{ptx}: Produce permuted indexes @@ -4192,6 +4193,7 @@ These commands work with (or produce) sorted files. * comm invocation:: Compare two sorted files line by line. * ptx invocation:: Produce a permuted index of file contents. * tsort invocation:: Topological sort. +* randomizing files:: Producing random output @end menu @@ -6018,6 +6020,152 @@ Anyhow, that's where tsort came from. To solve an old problem with the way the linker handled archive files, which has since been solved in different ways. address@hidden randomizing files address@hidden Producing random output + +The @command{shuf} and @command{sort -R/--random-sort} commands read input +(sorted or not) and output its lines in a randomized order. address@hidden shuffles all input lines equally, regardless of their content. address@hidden -R} shuffles the @emph{keys} of the input lines - +lines with identical sort keys will be grouped together: + address@hidden @columnfractions .5 .5 address@hidden address@hidden +$ printf '%s\n' A A A B B C D D | shuf +A +C +D +D +A +B +A +B address@hidden example address@hidden address@hidden +$ printf '%s\n' A A A B B C D D | sort -R +C +D +D +A +A +A +B +B address@hidden example address@hidden multitable + address@hidden -n @var{count}} outputs at most @var{count} number of lines (i.e., +a sub-sample). @command{sort --random-sort --uniq} outputs one line of each +group in a random order: + address@hidden @columnfractions .5 .5 address@hidden address@hidden +$ printf '%s\n' A A A B B C D D | shuf -n5 +B +D +A +D +B address@hidden example address@hidden address@hidden +$ printf '%s\n' A A A B B C D D | sort -R -u +C +A +B +D address@hidden example address@hidden multitable + address@hidden operates on keys. Random and non-random keys can be combined +to achieve desired results. In the following examples, the input file @file{in} +contains these lines: + address@hidden +$ cat in +A 5 +A 3 +A 7 +B 6 +B 4 +C 4 +D 9 +D 8 address@hidden example + address@hidden -R} without explicit keys operates on entire lines, +producing unexpected results (as @samp{A 5} and @samp{A 3} do not result +in identical key value): + address@hidden +$ sort -R in +A 7 +C 4 +A 3 +D 8 +B 6 +B 4 +A 5 +D 9 address@hidden example + +Specifing explicit key to sort randomly results in the keyed +colomn (letters) in random order (yet same keys groupped together), +and the other column (digits) sorted alphabetically (the default +last-resort sort): + address@hidden +$ sort -k1,1R in +C 4 +A 3 +A 5 +A 7 +B 4 +B 6 +D 8 +D 9 address@hidden example + + +In the following example, the first columns (letters) are sorted in +reverse alphabetical order, and the second column (digits) are sorted +randomly: + address@hidden +$ sort -k1,1r -k2,2R in +D 8 +D 9 +C 4 +B 6 +B 4 +A 7 +A 3 +A 5 address@hidden example + + +To randomize a single column and keep the input order of all other +columns, use the @option{-s/--stable} option. In the following example +the letters will be groupped in random order, while the digits will +be in the same order as the input file (i.e., the digits in group @samp{A} +will always be @samp{5},@samp{3},@samp{7} - exactly as in the input file): + address@hidden +$ sort -k1,1R -s in +D 9 +D 8 +B 6 +B 4 +A 5 +A 3 +A 7 +C 4 address@hidden example + + @node Operating on fields @chapter Operating on fields -- 2.11.0