[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How should --onall work?
From: |
Hans Schou |
Subject: |
Re: How should --onall work? |
Date: |
Mon, 6 Jun 2011 15:30:56 +0200 |
I might be doing something wrong, which makes it longer time. What is
the right syntax for what I am trying to do? (I expect you can read my
mind)
$ time ./src/parallel "ssh {} 'hostname ; uptime'" ::: c d e f
castor
13:18:12 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00
elvis
13:18:12 up 32 days, 2:59, 2 users, load average: 0.34, 0.34, 0.28
frank
13:18:12 up 160 days, 1:45, 0 users, load average: 4.74, 5.83, 5.70
daimi
15:18:12 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00
real 0m1.031s
user 0m0.152s
sys 0m0.032s
$ time ./src/parallel --onall -S c,d,e,f "hostname ; uptime #" ::: 1
castor
13:18:18 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00
elvis
13:18:19 up 32 days, 2:59, 2 users, load average: 0.32, 0.33, 0.28
frank
13:18:19 up 160 days, 1:45, 0 users, load average: 4.76, 5.82, 5.70
daimi
15:18:19 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00
real 0m1.291s
user 0m0.564s
sys 0m0.112s
/hans
2011/5/26 Ole Tange <tange@gnu.org>:
> I have been convinced that GNU Parallel should have an --onall option.
>
> --onall (unimplemented)
> Run all the jobs on all computers given with --sshlogin. GNU
> parallel will log into --jobs number of computers in parallel
> and run one job at a time on the computer. The order of the
> jobs will not be changed, but some computers may finish
> before others.
>
> I intend this:
>
> parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ \$2}'
> ::: a b c ::: 1 2 3
>
> to do:
>
> parallel -S eos '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
> parallel -S iris '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>
> In practise I believe this could be easily implemented by having GNU
> Parallel call parallel like this:
>
> parallel -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} {2}) | awk
> \{print\ \$2}'
> parallel -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo {3} {2}) | awk
> \{print\ \$2}'
>
> where I simply put 'a\nb\nc\n' and '1\n2\n3\n' into /tmp/abc and
> /tmp/123 respectively. As they are already being put into temporary
> files then the change may be small. I believe this would work out
> fine.
>
> A small penalty is that if run n jobs in parallel and have 2n hosts,
> it will do all the jobs for host1-n first and then all the jobs for
> hostn-2n. It will not run the first job on all hosts first and then
> the second.
>
> - o -
>
> I have a harder time figuring how to deal with stdin:
>
> cat | parallel --onall -S eos,iris
>
> This should run whatever comes from cat on both eos and iris. While
> the above is easy:
>
> cat | tee >(ssh eos) >(ssh iris) >/dev/null
>
> it becomes harder if you have so many hosts (10000s) that you cannot
> login to all of them at the same time.
>
> Also this one is tricky as you have to keep the {n} working:
>
> cat | parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\
> \$2}' :::: - ::: a b c ::: 1 2 3
>
> Maybe the solution is to accept that we have to read all of stdin
> first, put that in a file and use -a as above?
>
> So the tricky one will be executed like:
>
> # Stuff everything from stdin into a file
> cat > /tmp/stdin
> # Call parallel for each host in parallel
> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3}
> {2}) | awk \{print\ \$2}' &
> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo
> {3} {2}) | awk \{print\ \$2}' &
>
> The price will be that if you have a slow program generating the stdin
> then that program has to finish before GNU Parallel can even begin
> executing the jobs. Ideally GNU Parallel should start executing the
> jobs that it already knows have to be run.
>
> One way of solving that would be having a jobqueue for each sshlogin.
> That, however, looks like a big change to the code.
>
> - o -
>
> People wanting to use GNU Parallel for running the same commands on a
> lists of hosts can you please describe your situations, so the design
> will work well. At the very least I need to know:
>
> * number of hosts (can we just log in to all of them simultaneously?)
> * number of commands to be run (is it just 1 or is it a script
> generated on stdin?)
> * is speed an issue? (would it be OK to ssh for each command?)
> * how are the commands generated? (is it a fast program, so it is OK
> to read everything before executing anything?)
>
>
> /Ole
>
>