[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
unexpected behavior when using GNU parallel with block and recstart to b
From: |
juncus |
Subject: |
unexpected behavior when using GNU parallel with block and recstart to break up fasta file |
Date: |
Thu, 29 Mar 2012 16:54:47 -0700 |
Hello,
I don't need to say how great GNU parallel is (GREAT!). But for the
first time, I have encountered a behavior I didn't expect from it. I
am trying to break up a big input FASTA file (DNA sequence) using the
--block and --recstart options. But it always seems to create ONE
more file than I really want. I mean, if I have specified 10 jobs (-j
10), and if the block size on the 10th job is still below my
specification (--block 1200), why does it make an 11th file? This
means that 10 jobs in parallel run, and then 1 MORE job has to run to
get the last record.
>From the man page, for the section on --pipe: "...The block read will
have the final partial record removed before the block is passed on to
the job. The partial record will be prepended to next block."
I *think* I understand why it considers the last record to be partial
-- is it because I haven't given it a --recend so it doesn't actually
KNOW that the last record is the last record ?? -- but I am not sure
how to specify --recend for a FASTA file. Can anyone help?
Is this possibly a bug?
Many thanks in advance,
Owen
This single line provides a reproducible example (a little weird to
use tee this way, but it's how I am tracking how the blocks are
working):
$ seq 1000 | sed 's/^/>header\n/' | parallel -j 10 --block 1200
--recstart '>' --pipe "tee partialpseudofasta_{#}.txt >/dev/null"
$ ls -l partialpseudofasta_*
-rw-rw-r-- 1 staff staff 1092 Mar 29 16:25 partialpseudofasta_10.txt
-rw-rw-r-- 1 staff staff 13 Mar 29 16:25 partialpseudofasta_11.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_12.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_13.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_14.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_15.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_16.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_17.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_18.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_19.txt
-rw-rw-r-- 1 staff staff 1188 Mar 29 16:25 partialpseudofasta_1.txt
-rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_20.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_2.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_3.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_4.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_5.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_6.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_7.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_8.txt
-rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_9.txt
$ parallel --version
GNU parallel 20120322
Copyright (C) 2007,2008,2009,2010,2011,2012 Ole Tange and Free
Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using GNU Parallel for a publication please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
- unexpected behavior when using GNU parallel with block and recstart to break up fasta file,
juncus <=