parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on using parallel as a batch queue system


From: Tong, Lianheng
Subject: Re: Advice on using parallel as a batch queue system
Date: Wed, 9 Jan 2013 15:26:05 +0000

Hi Ole,

> It seems the warning is not working any more. But the essence is that
> if you have 4 JobSlots, then you have to submit 4 jobs before they
> start. After that you can submit one at a time.
> 
> After the first 4 jobs have completed, output of every job is delayed
> by 4 jobs (unless you use --ungroup), but the job is run immediately.


Thank you very much, your explanation makes much sense. As long as this is 
expected behaviour then it is fine, as it is easy to circumvent this slight 
inconvenience with a shell script wrapper. 

One other thing I have noticed is that if running scripts which them-selves 
invoke mpirun -np <n_processes>, then parallel still runs nslot number of jobs, 
using in total of nslot * <n_processes> number of cores, i.e. in most cases 
overloading the system. 

Of course using parallel with MPI is probably way past the original purpose of 
the tool, and one should perhaps use a proper batch queue/parallel resources 
management tool (although so far most of the tools I know or can find are 
oriented towards HPC platforms, and would be too heavy-weight for work-stations 
with like a dozen cores).

My suggestions to modifications in the man pages: In the "queue" section:

 * You will get a warning if you do not submit JobSlots jobs within
the first second. E.g. if you have 8 cores and use -j+2 you have to
submit 10 jobs. These can be dummy jobs (e.g. echo foo). You can also
simply ignore the warning.  For parallel versions <VERSION> and higher, 
the warnings will not appear.

 * You have to submit JobSlot number of jobs before they will start, and after 
that you can 
submit one at a time, and job will start immediately if free slots are 
available.  Output from 
the running or completed jobs are withheld and will only be printed when 
JobSlots more 
jobs has been started (unless you use --ungroup or -u, in which case the output 
from 
the jobs are printed immediately).  E.g. if you have 10 jobslots then the 
output from the first 
completed job will only be printed when job 11 has started, and the output of 
second completed 
job will only be printed when job 12 has started and so on.

* Parallel does not monitor the resource usage of the jobs you have submitted, 
and assumes each job is serial. So if for example your job-script.sh contains 
calls such
as

mpirun -np n  command-1 input-1
mpirun -np n  command-2 input-2
...
mpirun -np n  command-N input-N

then

parallel -j N < job-script.sh

will launch all mpirun processes in the job-script in parallel (if N slots are 
available), and hence
the total CPU usage would be n*N cores.

Many thanks, and best regards,

Lianheng

==========================================================
Lianheng Tong                                                             Tel: 
+44 20 7679 3302
London Centre For Nanotechnology                       Fax: +44 20 7679 0595
University College London                                        address@hidden
17–19 Gordon Street, London WC1H 0AH, U.K.
==========================================================

On 9 Jan 2013, at 09:56, Ole Tange wrote:

> On Tue, Jan 8, 2013 at 6:22 PM, Tong, Lianheng <address@hidden> wrote:
> 
>> This is my first post and I am very pleased to have discovered gnu parallel 
>> about a month ago. It had saved me a lot hassle while running a lot of 
>> serial simulation jobs. So a big thanks to the developers :-)
> 
> I am happy it helped you. If you like GNU Parallel:
> 
> * Post the intro videos on Reddit/Diaspora*/forums/blogs/
>  Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
> * Get the merchandise https://www.gnu.org/s/parallel/merchandise.html
> * Give a demo at your local user group/team/colleagues
> * Request or write a review for your favourite blog or magazine
> * Request or build a package for your favourite distribution (if it is
> not already there)
> * Invite me for your next conference (Contact http://ole.tange.dk)
> 
> If you use GNU Parallel for research:
> 
> * Please cite GNU Parallel in you publications (use --bibtex)
> 
> If GNU Parallel saves you money:
> 
> * (Have your company) donate to FSF https://my.fsf.org/donate/
> 
>> I have however encountered some problems when trying to use parallel as a 
>> batch queue,
> :
>> So it seems that jobs are only run AFTER job-slots number (in this case 4) 
>> of jobs have been submitted. If I have only sent 3 jobs to the queue, then 
>> none of the jobs are executed until I send the SIGTERM to parallel to 
>> terminate the queue.
> 
> From 'man parallel':
> 
>  There are a two small issues when using GNU parallel as queue
> system/batch manager:
> 
>  * You will get a warning if you do not submit JobSlots jobs within
> the first second. E.g. if you have 8 cores and use -j+2 you have to
> submit 10 jobs. These can be dummy jobs (e.g. echo foo). You can also
> simply ignore the warning.
> 
>  * Jobs will be run immediately, but output from jobs will only be
> printed when JobSlots more jobs has been started. E.g. if you have 10
> jobslots then the output from the first completed job will only be
> printed when job 11 is started.
> 
> It seems the warning is not working any more. But the essence is that
> if you have 4 JobSlots, then you have to submit 4 jobs before they
> start. After that you can submit one at a time.
> 
> After the first 4 jobs have completed, output of every job is delayed
> by 4 jobs (unless you use --ungroup), but the job is run immediately.
> 
> There are 2 ways from here:
> 
> * Help re-phrasing the text in the man-page, so it is more clear what
> is going on.
> 
> * Submit a patch that fixes the bug (without breaking/slowing down the
> rest). Since this is a bug that:
>    - does not affect me
>    - does not seem critical
>    - has a work-around
>    - I do not get paid to fix
>  then I am unlikely to spend time fixing it, but I will look
> favourably at a patch that fixes it.
> 
> Hope that clears up the situation.
> 
> 
> /Ole
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]