[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sshloginfile changes like -j procfile
From: |
Ole Tange |
Subject: |
Re: sshloginfile changes like -j procfile |
Date: |
Mon, 27 Jun 2011 00:24:55 +0200 |
On Thu, Jun 23, 2011 at 2:47 AM, Jon Wilson <parallel@wilsonjc.us> wrote:
> 1) submit jobs that will take several hours to run, during which time I
> won't have anything else in particular to do
> 2) Go work on bringing cluster nodes back up
> 3) Change ~/.parallel/sshloginfile
> 4) GNU parallel notices that the file has changed, just like if I were
> using -j procfile, and immediately starts jobs on those additional nodes.
>
> I am using parallel 20110522. Is this behavior already implemented? If
> not, I would like to request this feature.
That is currently not implemented.
A workaround for you may be to put all the nodes in
~/.parallel/sshloginfile and use --retry to retry the job if it fails
on a node (e.g. if it is not up). You should set --retry to
number_of_nodes_down+1, so that if GNU Parallel retries on another
node that is down, it will retry until it finds at least one that is
up.
It is abusing the --retry and if a job actually _does_ fail, then you
will run that job number_of_nodes_down+1 times.
If you still want the feature, file a Whislist at
https://savannah.gnu.org/bugs/?group=parallel
/Ole