[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: unexpected behavior when using GNU parallel with block and recstart
From: |
juncus |
Subject: |
Re: unexpected behavior when using GNU parallel with block and recstart to break up fasta file |
Date: |
Fri, 30 Mar 2012 09:13:22 -0700 |
Hi Ole, thanks for the reply.
Not quite. True, I am observing the same thing (empty files 12
through 20 below), but what is bothering me is file #11, which has 13
bytes, and could have easily fit into file #10 (1092 bytes) and still
been well below the 1200 threshold.
Another way to have asked this question might have been: Will
parallel always assume the last record is partial, if you only provide
recstart? Because for some file types, it might not be feasible to
provide a recend (like FASTA files, where all you can rely on is the
">" which marks the start of the header for each record.) So in these
situations will parallel always kick the last single record into its
own solitary process?
-rw-rw-r-- 1 staff staff 1092 Mar 29 16:25 partialpseudofasta_10.txt
-rw-rw-r-- 1 staff staff 13 Mar 29 16:25 partialpseudofasta_11.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_12.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_13.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_14.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_15.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_16.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_17.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_18.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_19.txt
-rw-rw-r-- 1 staff staff 1188 Mar 29 16:25 partialpseudofasta_1.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_20.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_2.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_3.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_4.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_5.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_6.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_7.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_8.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_9.txt
Thanks
Owen
On Thu, Mar 29, 2012 at 5:40 PM, Ole Tange <tange@gnu.org> wrote:
> On Fri, Mar 30, 2012 at 1:54 AM, <juncus@gmail.com> wrote:
>> Hello,
>>
>> I don't need to say how great GNU parallel is (GREAT!).
>
> Good to hear.
>
>> But for the
>> first time, I have encountered a behavior I didn't expect from it. I
>> am trying to break up a big input FASTA file (DNA sequence) using the
>> --block and --recstart options. But it always seems to create ONE
>> more file than I really want. I mean, if I have specified 10 jobs (-j
>> 10), and if the block size on the 10th job is still below my
>> specification (--block 1200), why does it make an 11th file? This
>> means that 10 jobs in parallel run, and then 1 MORE job has to run to
>> get the last record.
>
> It sounds like: https://savannah.gnu.org/bugs/?34241
>
> /Ole