parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamically changing remote servers list


From: Douglas A. Augusto
Subject: Re: Dynamically changing remote servers list
Date: Sun, 14 Sep 2014 02:31:09 -0300
User-agent: Mutt/1.5.23 (2014-03-12)

On 28/08/2014 at 00:34,
Ole Tange <ole@tange.dk> wrote:

> No idea, but it is very likely that the code has bugs: It is the
> youngest code and there is no testing of that part of the code. If you
> can show something reproducible then let us fix it. Your description
> is unfortunately not enough for me to see if the bug is in GNU
> Parallel and in that case where the bug is.

Hi,

I've figured out what is going wrong: every time the ssh login file is reloaded
it seems that a number of jobs are immediately launched on *all* servers,
regardless of their load (number of slots currently used).

Suppose we have the following entries in the ssh login file

   1/server1.net
   5/server5.net

and that there are 1 and 5 jobs currently running on server1.net and
server5.net, respectively.

If this file is reread for whatever reason, then GNU Parallel will launch 1
more job on server1.net (thus a total of 2 jobs) and 5 more jobs on
server5.net, totaling 10 jobs there. After that--and if the ssh login file
doesn't change anymore--GNU Parallel will behave as expected and will only
start new jobs if there is no job running on server1.net and less then 5 jobs
running on server5.net. Unfortunately, for any large set of reasonably long
running jobs, when the ssh login file changes frequently (which is common on
unreliable machines/networks) GNU Parallel will end up launching a zillion jobs
on each server, effectively rendering them inoperative.

-- 
Douglas A. Augusto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]