parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

unexpected behavior when using GNU parallel with block and recstart to b


From: juncus
Subject: unexpected behavior when using GNU parallel with block and recstart to break up fasta file
Date: Thu, 29 Mar 2012 16:54:47 -0700

Hello,

I don't need to say how great GNU parallel is (GREAT!).  But for the
first time, I have encountered a behavior I didn't expect from it.  I
am trying to break up a big input FASTA file (DNA sequence) using the
--block and --recstart options.  But it always seems to create ONE
more file than I really want.  I mean, if I have specified 10 jobs (-j
10), and if the block size on the 10th job is still below my
specification (--block 1200), why does it make an 11th file?  This
means that 10 jobs in parallel run, and then 1 MORE job has to run to
get the last record.

>From the man page, for the section on --pipe: "...The block read will
have the final partial record removed before the block is passed on to
the job. The partial record will be prepended to next block."

I *think* I understand why it considers the last record to be partial
-- is it because I haven't given it a --recend so it doesn't actually
KNOW that the last record is the last record ?? -- but I am not sure
how to specify --recend for a FASTA file.  Can anyone help?

Is this possibly a bug?

Many thanks in advance,
Owen

This single line provides a reproducible example (a little weird to
use tee this way, but it's how I am tracking how the blocks are
working):

$ seq 1000 | sed 's/^/>header\n/' | parallel -j 10 --block 1200
--recstart '>' --pipe "tee partialpseudofasta_{#}.txt >/dev/null"

$ ls -l partialpseudofasta_*

-rw-rw-r-- 1 staff staff 1092 Mar 29 16:25 partialpseudofasta_10.txt
-rw-rw-r-- 1 staff staff   13 Mar 29 16:25 partialpseudofasta_11.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_12.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_13.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_14.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_15.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_16.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_17.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_18.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_19.txt
-rw-rw-r-- 1 staff staff 1188 Mar 29 16:25 partialpseudofasta_1.txt
-rw-rw-r-- 1 staff staff    0 Mar 29 16:25 partialpseudofasta_20.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_2.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_3.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_4.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_5.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_6.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_7.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_8.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_9.txt


$ parallel --version
GNU parallel 20120322
Copyright (C) 2007,2008,2009,2010,2011,2012 Ole Tange and Free
Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.

Web site: http://www.gnu.org/software/parallel

When using GNU Parallel for a publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]