[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Job Processing Was RE: Parallel Merge
From: |
Nathan Watson-Haigh |
Subject: |
Job Processing Was RE: Parallel Merge |
Date: |
Tue, 23 Aug 2011 16:08:56 +0930 |
Hi Ole,
I'm in the middle of optimising the processing of such file. While I'm at it I
have a quick question:
I'm processing my input file with -N 2500 as it seems to give me the best
processing time. My CPU usage and disk IO are well below their maximum capacity
and wondered how GNU parallel processes/submits new jobs as others are
completed? I'm thinking that GNU parallel is somehow stalling my pipeline.
Could you provide some information on this aspect of GNU parallel?
Cheers,
Nathan
>
Nathan Watson-Haigh
Senior Bioinformatician | The Australian Wine Research Institute
Waite Precinct, Hartley Grove cnr Paratoo Road, Urrbrae (Adelaide) SA 5064 |
http://www.awri.com.au/contact/map.asp
PO Box 197, Glen Osmond SA 5064, Australia
T: +61 8 83136836 (direct) | T: +61 8 83136600 | F: +61 8 83136601
8 www: http://www.awri.com.au/ | http://www.awri.com.au/events/calendar/
This communication, including attachments, is intended only for the
addressee(s) and contains information which might be confidential and/or the
copyright of The Australian Wine Research Institute (AWRI) or a third party. If
you are not the intended recipient of this communication please immediately
delete and destroy all copies and contact the sender. If you are the intended
recipient of this communication you should not copy, disclose or distribute any
of the information contained herein without the consent of the AWRI and the
sender. Any views expressed in this communication are those of the individual
sender except where the sender specifically states them to be the views of the
AWRI. No representation is made that this communication, including attachments,
is free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.
-----Original Message-----
> From: ole.tange@gmail.com [mailto:ole.tange@gmail.com] On Behalf Of Ole
> Tange
> Sent: Saturday, 20 August 2011 11:49 PM
> To: Nathan Watson-Haigh
> Cc: parallel@gnu.org
> Subject: Re: Parallel Merge
>
> On Sat, Aug 20, 2011 at 12:54 AM, Nathan Watson-Haigh
> <nathan.watson-haigh@awri.com.au> wrote:
> >
> > What I'm actually doing is using the ABySS genome assembler. Part of
> the pipeline is:
> >
> > KAligher | ParseAligns | sort | DistanceEst
> >
> > KAligner takes sequences from one file (queries) and finds alignments
> agianst sequences in another file (targets), outputting these in
> Sequence Alignment/Map (SAM) format. ParseAligns takes the SAM format
> and filters out some alignments. It is the ParseAligns step which is
> slowest and I'm looking at how best to split up the work to make use of
> more cores. A job for early next week!
>
> The most obvious way seems to be:
>
> cat queries | parallel --pipe --files 'KAligher | ParseAligns |
> sort' | parallel -Xj1 sort -m {}\;rm {} | DistanceEst
>
> Would that work?
>
> /Ole