bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel Bug Reports Job failure semantics


From: Alastair Andrew
Subject: GNU Parallel Bug Reports Job failure semantics
Date: Tue, 13 Sep 2011 00:45:03 +0100

Hi,

I've been using GNU parallel for a while now to distribute jobs across about 30 
machines in my department's lab. I've set up the .parallel/sshloginfile to list 
all the machines and their sizes so I can just use the -S .. flag to spread the 
load. I tend to use the strictest error handling but I find it a bit 
overzealous especially with regard to ssh failures. Often one or two machines 
will be offline for maintenance (or undergrads will have unwittingly switched 
them off); when GNU parallel tries to login to one of these machines it won't 
be able to and flags this up as a failure (thus terminating all my jobs). 

Currently I see two options: keep the .parallel/sshloginfile synced with the 
currently accessible machines, or choose a large enough retry limit that this 
problem won't be encountered as parallel tries to compensate. Neither seems 
perfect. I think it would be better if GNU parallel didn't regard its ssh 
failure as an error. After all it's not the user's task that has failed; the 
job hasn't started, GNU parallel failed to distribute it. This would allow 
users to specify a static pool of machines without worrying too much whether a 
few were down. Obviously in a worst case scenario maybe the majority of 
machines are unreachable so only a few are actually doing all the work. In that 
case maybe there should be a threshold where parallel informs the user. 

Anyway, I don't know what anyone else's option on the matter is I just thought 
it might simplify the process for users. 
Cheers,
Alastair
---------------------------------------------------------
Alastair Andrew,
address@hidden
Department of Computer and Information Sciences,
University of Strathclyde.
Tel: 0141 548 3138    Fax: 0141 548 4523
The University of Strathclyde is a charitable body, registered in Scotland, 
with registration number SC015263.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]