[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel Bug Reports recstart finds spurious empty records
From: |
Ole Tange |
Subject: |
Re: GNU Parallel Bug Reports recstart finds spurious empty records |
Date: |
Wed, 19 Sep 2012 13:50:50 +0200 |
On Tue, Sep 18, 2012 at 6:00 AM, Martin Frith <address@hidden> wrote:
> Hello,
>
> I'm using GNU parallel 20120822, and I tested it like this:
>
> ##################
> echo ">" > junk.fasta
>
> seq 10000000 >> junk.fasta
>
> cat junk.fasta junk.fasta > junks.fasta
>
> parallel --pipe --recstart ">" wc -c < junks.fasta
> ##################
>
> There are only 2 records, but it finds lots of extra records of size zero.
See: http://savannah.gnu.org/bugs/?34241
> Related to this, parallel seems to become slow when the records are much
> bigger than the block size.
That is likely: Parallel read a chunk of size block-size at a time. If
it cannot find a single record in that it will have to append yet
another block of the same size. This will give performance of O(n^2).
I do not see a way to avoid that. For performance reasons you should
set block size bigger than one record.
/Ole