parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How should --onall work?


From: Hans Schou
Subject: Re: How should --onall work?
Date: Mon, 6 Jun 2011 15:30:56 +0200

I might be doing something wrong, which makes it longer time. What is
the right syntax for what I am trying to do? (I expect you can read my
mind)

$ time ./src/parallel "ssh {} 'hostname ; uptime'" ::: c d e f
castor
 13:18:12 up 48 days,  6:14,  1 user,  load average: 0.00, 0.00, 0.00
elvis
 13:18:12 up 32 days,  2:59,  2 users,  load average: 0.34, 0.34, 0.28
frank
 13:18:12 up 160 days,  1:45,  0 users,  load average: 4.74, 5.83, 5.70
daimi
 15:18:12 up 47 days, 22:58,  0 users,  load average: 0.00, 0.00, 0.00

real    0m1.031s
user    0m0.152s
sys     0m0.032s

$ time ./src/parallel --onall -S c,d,e,f "hostname ; uptime #" ::: 1
castor
 13:18:18 up 48 days,  6:14,  1 user,  load average: 0.00, 0.00, 0.00
elvis
 13:18:19 up 32 days,  2:59,  2 users,  load average: 0.32, 0.33, 0.28
frank
 13:18:19 up 160 days,  1:45,  0 users,  load average: 4.76, 5.82, 5.70
daimi
 15:18:19 up 47 days, 22:58,  0 users,  load average: 0.00, 0.00, 0.00

real    0m1.291s
user    0m0.564s
sys     0m0.112s


/hans
2011/5/26 Ole Tange <tange@gnu.org>:
> I have been convinced that GNU Parallel should have an --onall option.
>
>       --onall (unimplemented)
>                Run all the jobs on all computers given with --sshlogin. GNU
>                parallel will log into --jobs number of computers in parallel
>                and run one job at a time on the computer. The order of the
>                jobs will not be changed, but some computers may finish
>                before others.
>
> I intend this:
>
>  parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ \$2}'
> ::: a b c ::: 1 2 3
>
> to do:
>
>  parallel -S eos '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>  parallel -S iris '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>
> In practise I believe this could be easily implemented by having GNU
> Parallel call parallel like this:
>
>  parallel -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} {2}) | awk
> \{print\ \$2}'
>  parallel -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo {3} {2}) | awk
> \{print\ \$2}'
>
> where I simply put 'a\nb\nc\n' and '1\n2\n3\n' into /tmp/abc and
> /tmp/123 respectively. As they are already being put into temporary
> files then the change may be small. I believe this would work out
> fine.
>
> A small penalty is that if run n jobs in parallel and have 2n hosts,
> it will do all the jobs for host1-n first and then all the jobs for
> hostn-2n. It will not run the first job on all hosts first and then
> the second.
>
> - o -
>
> I have a harder time figuring how to deal with stdin:
>
>  cat | parallel --onall -S eos,iris
>
> This should run whatever comes from cat on both eos and iris. While
> the above is easy:
>
>  cat | tee >(ssh eos) >(ssh iris) >/dev/null
>
> it becomes harder if you have so many hosts (10000s) that you cannot
> login to all of them at the same time.
>
> Also this one is tricky as you have to keep the {n} working:
>
>  cat | parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\
> \$2}' :::: - ::: a b c ::: 1 2 3
>
> Maybe the solution is to accept that we have to read all of stdin
> first, put that in a file and use -a as above?
>
> So the tricky one will be executed like:
>
>  # Stuff everything from stdin into a file
>  cat > /tmp/stdin
>  # Call parallel for each host in parallel
>  parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3}
> {2}) | awk \{print\ \$2}' &
>  parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo
> {3} {2}) | awk \{print\ \$2}' &
>
> The price will be that if you have a slow program generating the stdin
> then that program has to finish before GNU Parallel can even begin
> executing the jobs. Ideally GNU Parallel should start executing the
> jobs that it already knows have to be run.
>
> One way of solving that would be having a jobqueue for each sshlogin.
> That, however, looks like a big change to the code.
>
> - o -
>
> People wanting to use GNU Parallel for running the same commands on a
> lists of hosts can you please describe your situations, so the design
> will work well. At the very least I need to know:
>
> * number of hosts (can we just log in to all of them simultaneously?)
> * number of commands to be run (is it just 1 or is it a script
> generated on stdin?)
> * is speed an issue? (would it be OK to ssh for each command?)
> * how are the commands generated? (is it a fast program, so it is OK
> to read everything before executing anything?)
>
>
> /Ole
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]