parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallel + SSH = slow


From: Ole Tange
Subject: Re: Parallel + SSH = slow
Date: Sat, 11 Sep 2010 17:09:00 +0200

On Sat, Sep 11, 2010 at 12:12 AM, Christopher Sebastian
<csebastian3@gmail.com> wrote:
> Hi Ole and everyone,
>
> GNU Parallel is very nice!  I find that it is amazingly fast for tasks
> performed on the local host.  :)

Happy you like it. Please spread the word as it seems a lot of people
have never heard of GNU Parallel. You can do that:

* by writing a blog entry on GNU Parallel
* by requesting GNU Parallel being adopted in your favorite distribution
* by writing or requesting an article about GNU Parallel in your
favorite magazine
* by commenting on articles that could be improved by using GNU Parallel

Feel free to link to the intro video: http://nd.gd/0s

> ...But once I start using remote hosts with --sshloginfile, the
> startup time is way too long.  It takes around 30 seconds per host to
> initialize (so, if I put 20 hosts into the sshloginfile, 10 minutes
> are spent just on 'parallel' startup).

How long does this take:

  ssh yourhost echo

If that takes 30 seconds, then you probably have a DNS timeout problem.

Have you had a look at -M? It is experimental as it fails now and then.

> ALSO, I am finding that the tasks are only sent to two or three of the
> remote hosts, and all the other hosts are doing 0% of the work.  (I am
> sure that it is not a problem with the remote host because if I remove
> the 'successful' hosts from the sshloginfile, then the work will be
> sent to another two or three *different* servers instead.)

This can be cause by the following situation:

server1: 8 cores
server2: 8 cores
server3: 8 cores
jobs to run: 15

In this case Parallel may choose to send all the jobs to server1 and server2.

> Here is the little "scanner" script that I am running.  (It attempts
> to SSH into each host in my network and prints a CONNECTED message if
> successful):
>
> (for a in $(seq 144 153); do
>   for b in $(seq 1 254); do
>     echo 10.72.$a.$b
>   done
>  done) | parallel --progress -j0 --sshloginfile parallel_sshloginfile
> ssh -oBatchMode=yes -oUserKnownHostsFile=/dev/null
> -oStrictHostKeyChecking=no -oConnectTimeout=3 {} bash -c \\\'echo
> CONNECTED:{}:\\\$\\\(hostname\\\):\\\$\\\(domainname\\\)\\\'
>
> Here is my 'parallel_sshloginfile':
>
> :
> 10.72.144.199
> 10.72.144.200
> 10.72.144.201

I imagine you want to use sshlogins to login to remote hosts to get
more jobs running in parallel than would be possible using just -j0.
Unfortunately that is not how GNU Parallel works. GNU Parallel will
open an connection to the sshlogins for each an every jobs, so the
total number of jobs running in parallel is limited by how many jobs
you can run on your local system.

So in this case it will probably not make sense to use --sshloginfile
- plus it makes your script harder to read because of the extra level
of quoting.

Please test if this works instead:

(for a in $(seq 144 153); do
 seq 1 254 | parallel echo 10.72.$a.{}
done) | parallel -j0 ssh -oBatchMode=yes
-oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no
-oConnectTimeout=3 {} 'echo
CONNECTED:{}:\$\(hostname\):\$\(domainname\)'

> Thank you for any advice.
>
> By the way, is there a mailing list archive anywhere so I don't ask
> questions that have already been answered before?

Currently the traffic is so tiny that you do not really need to worry
about that:

http://lists.gnu.org/archive/html/parallel/


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]