parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Adding a --reduce option to GNU parallel


From: Diomidis Spinellis
Subject: Adding a --reduce option to GNU parallel
Date: Thu, 19 Dec 2019 10:50:49 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1

Currently GNU parallel will output the results of the jobs either in the order in which they are completed or (with --keep-order) in the order they were specified. How would you feel about adding a --reduce option that would specify a command to use in order to combine the results? The command would take as arguments files (or file descriptors via /dev/fd/) of the generated output of each job and produce the final output of parallel.

Here are some examples.

parallel --reduce cat
is the same as parallel --keep-order

parallel --pipepart --reduce 'sort -m' sort :::: file
will sort the file in parallel and then merge-sort the parts.

<directories parallel --reduce 'tar --concatenate'  tar cf -
will create a single tar file from the parallel running ones.

parallel --pipepart --reduce dgsh-merge-sum \
  "tr -s ' \t\n\r\f' '\n' | sort | uniq -c" :::: file
will count the number of times each word appears in the specified input file. (The dgsh-merge-sum command sums sorted output from uniq -c; see https://github.com/dspinellis/dgsh/blob/master/core-tools/src/dgsh-merge-sum.pl.)

--
Diomidis



reply via email to

[Prev in Thread] Current Thread [Next in Thread]