parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: unexpected behavior when using GNU parallel with block and recstart


From: juncus
Subject: Re: unexpected behavior when using GNU parallel with block and recstart to break up fasta file
Date: Fri, 30 Mar 2012 09:13:22 -0700

Hi Ole, thanks for the reply.

Not quite.  True, I am observing the same thing (empty files 12
through 20 below), but what is bothering me is file #11, which has 13
bytes, and could have easily fit into file #10 (1092 bytes) and still
been well below the 1200 threshold.

Another way to have asked this question might have been:  Will
parallel always assume the last record is partial, if you only provide
recstart?  Because for some file types, it might not be feasible to
provide a recend (like FASTA files, where all you can rely on is the
">" which marks the start of the header for each record.)  So in these
situations will parallel always kick the last single record into its
own solitary process?

-rw-rw-r-- 1 staff staff 1092 Mar 29 16:25 partialpseudofasta_10.txt
-rw-rw-r-- 1 staff staff   13 Mar 29 16:25 partialpseudofasta_11.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_12.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_13.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_14.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_15.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_16.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_17.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_18.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_19.txt
-rw-rw-r-- 1 staff staff 1188 Mar 29 16:25 partialpseudofasta_1.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_20.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_2.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_3.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_4.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_5.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_6.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_7.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_8.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_9.txt


Thanks
Owen



On Thu, Mar 29, 2012 at 5:40 PM, Ole Tange <tange@gnu.org> wrote:
> On Fri, Mar 30, 2012 at 1:54 AM,  <juncus@gmail.com> wrote:
>> Hello,
>>
>> I don't need to say how great GNU parallel is (GREAT!).
>
> Good to hear.
>
>>  But for the
>> first time, I have encountered a behavior I didn't expect from it.  I
>> am trying to break up a big input FASTA file (DNA sequence) using the
>> --block and --recstart options.  But it always seems to create ONE
>> more file than I really want.  I mean, if I have specified 10 jobs (-j
>> 10), and if the block size on the 10th job is still below my
>> specification (--block 1200), why does it make an 11th file?  This
>> means that 10 jobs in parallel run, and then 1 MORE job has to run to
>> get the last record.
>
> It sounds like: https://savannah.gnu.org/bugs/?34241
>
> /Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]