[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-register-public] [task #7250] Submission of parallel
From: |
Ole Tange |
Subject: |
[Savannah-register-public] [task #7250] Submission of parallel |
Date: |
Mon, 27 Aug 2007 14:20:48 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1+lenny1) |
URL:
<http://savannah.gnu.org/task/?7250>
Summary: Submission of parallel
Project: Savannah Administration
Submitted by: tange
Submitted on: Monday 08/27/2007 at 16:20
Should Start On: Monday 08/27/2007 at 00:00
Should be Finished on: Thursday 09/06/2007 at 00:00
Category: Project Approval
Priority: 5 - Normal
Status: None
Privacy: Public
Percent Complete: 0%
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
Details:
A new project has been registered at Savannah
This project account will remain inactive until a site admin approves or
discards the registration.
= Registration Administration =
While this item will be useful to track the registration process, *approving
or discarding the registration must be done using the specific Group
Administration
<https://savannah.gnu.org/siteadmin/groupedit.php?group_id=9478> page*,
accessible only to site administrators, effectively *logged as site
administrators* (superuser):
* Group Administration
<https://savannah.gnu.org/siteadmin/groupedit.php?group_id=9478>
= Registration Details =
* Name: *parallel*
* System Name: *parallel*
* Type: non-GNU software & documentation
* License: GNU General Public License v3 or later
----
==== Description: ====
NAME
parallel - run jobs in parallel
SYNOPSIS
parallel [-g] [-j N] [-s] [command] < list_of_arguments
DESCRIPTION
For each line of input parallel will execute command with the line as
arguments. If no command is given the line is executed.
Several lines will be run in parallel.
command If command contains {} every instance will be substituted
with
the arguments.
-g Group output. Avoid output from each job running together
with
other jobs. Will only print output when the job is done.
stderr is merged with stdout.
-j N Run N jobs in parallel. Default is 10.
-j +N Add N to the number of CPUs. Run this many jobs in parallel.
For compute intensive jobs -j +0 is useful.
-j -N Subtract N from the number of CPUs. Run this many jobs in
par‐
allel. If the evaluated number is less than 1 then 1 will
be
used.
-j N% Multiply N% with the number of CPUs. Run this many jobs in
parallel. If the evaluated number is less than 1 then 1
will
be used.
-s Silent. Do not print the job to be run.
-x eXact one argument per line. If the lines are filenames that
may contain shell special characters, (such as space or *)
then this will protect the characters from being interpreted
by the shell.
EXAMPLE 1: Ressource inexpensive jobs and grouping
A ressource inexpensive job is a job that takes very little CPU, disk
I/O and network I/O. Ping is an example of a ressource inexpensive
job.
wget is too - if the webpages are small.
The content of the file jobs_to_run:
ping -c 1 10.0.0.1
wget http://status-server/status.cgi?ip=10.0.0.1
ping -c 1 10.0.0.2
wget http://status-server/status.cgi?ip=10.0.0.2
...
ping -c 1 10.0.0.255
wget http://status-server/status.cgi?ip=10.0.0.255
To run 100 processes simultaneously do:
parallel -j 100 < jobs_to_run
The output of the commands will run together. If it is important to
keep the outputs separated use -g (grouping):
parallel -gj 100 < jobs_to_run
This will print the output of each job only when the job is finished.
EXAMPLE 2: Argument appending, grouping, slient, and exact
parallel can work similar to ’xargs -n1’.
To output all html files run:
find . -name ’*.html’ | parallel cat
As the output here will run together grouping is adviced:
find . -name ’*.html’ | parallel -g cat
If the output is to be used as input for another program it may be a
good idea not to print the command being run using -s (silent):
find . -name ’*.html’ | parallel -sg cat
If some of the filenames have special characters (eg. a file called
’**foo & bar*.html’) then force interpreting the lines exact with
-x:
find . -name ’*.html’ | parallel -xsg cat
EXAMPLE 3: Compute intensive jobs and substitution
If ImageMagick is installed this will generate a thumbnail of a jpg
file:
convert -geometry 120 foo.jpg thumb_foo.jpg
If the system has more than 1 CPU it can be run with number-of-cpus
jobs in parallel (-j +0). This wil do that for all jpg files in a
directory:
ls *.jpg | parallel -j +0 convert -geometry 120 {} thumb_{}
To do it recursively:
find . -name ’*.jpg’ | parallel -j +0 convert -geometry 120 {}
{}_thumb.jpg
Notice how the argument has to start with {} as {} will include path
(e.g. running "convert -geometry 120 ./foo/bar.jpg
thumb_./foo/bar.jpg"
would clearly be wrong). It will result in files like
./foo/bar.jpg_thumb.jpg. If that is not wanted this can fix it:
find . -name ’*.jpg’ | \
perl -pe ’chomp; $a=$_; s:/([^/]+)$:/thumb_$1:; $_="convert
-geometry 120 $a $_\n"’ | \
parallel -j +0
Unfortunately this will not work if the filenames contain special
char‐
acters (such as space or quotes). If you have ren installed this is a
better solution:
find . -name ’*.jpg’ | parallel -j +0 convert -geometry 120 {}
{}_thumb.jpg
find . -name ’*_thumb.jpg’ | ren
’s/_thumb.jpg//;s/^/thumb_/’
EXAMPLE 4: Substituion and redirection
This will compare all files in the dir to the file foo and save the
diffs in corresponding .diff files:
ls | parallel diff {} foo ">"{}.diff
Quoting of > is necessary to postpone the redirection. Another
solution
is to quote the whole command:
ls | parallel "diff {} foo >{}.diff"
EXAMPLE 5: Composed commands
A job can consist of several commands. This will print the number of
files in each directory:
ls | parallel -sg ’echo -n {}" "; ls {}|wc -l’
QUOTING
For more advanced use quoting may be an issue. The following will
print
the filename for each line that has exactly 2 columns:
perl -ne ’/^\S+\s+\S+$/ and print $ARGV,"\n"’ file
To do that using parallel you will do something like this:
ls | parallel -sg "perl -ne ’/^\\S+\\s+\\S+$/ and print
\$ARGV,\"\\n\"’"
Notice how you need to quote \’s, "’s, and $’s.
To avoid dealing with the quoting problems it may be easier just to
write a small script and have parallel call that script.
BUGS
As parallel (ab)uses make to make the jobs in parallel limitations
from
make apply. For old versions of make (before 3.81) this means that
the
initialization will take O(n*n) where n is the number of jobs to be
executed. To have a fair compromise in initialization I have picked a
chunk size of 5000. When 3.81 has become standard for a while this
chunksize should probably be removed. The cost, however, for having
the
chunksize seems neglicible.
AUTHOR
2007-07-23,2007-08-09 Ole Tange, http://ole.tange.dk
LICENSE
This program is free software; you can redistribute it and/or modify
it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3 of the License, or (at
your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General
Public License for more details.
You should have received a copy of the GNU General Public License
along
with this program. If not, see <http://www.gnu.org/licenses/>.
DEPENDENCIES
parallel uses GNU Make, Perl, and the Perl module Getopt::Std.
SEE ALSO
make(1), xargs(1)
==== Other Software Required: ====
DEPENDENCIES
parallel uses GNU Make, Perl, and the Perl module Getopt::Std.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/task/?7250>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Savannah-register-public] [task #7250] Submission of parallel,
Ole Tange <=