bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports recstart finds spurious empty records


From: Ole Tange
Subject: Re: GNU Parallel Bug Reports recstart finds spurious empty records
Date: Wed, 19 Sep 2012 13:50:50 +0200

On Tue, Sep 18, 2012 at 6:00 AM, Martin Frith <address@hidden> wrote:
> Hello,
>
> I'm using GNU parallel 20120822, and I tested it like this:
>
> ##################
> echo ">" > junk.fasta
>
> seq 10000000 >> junk.fasta
>
> cat junk.fasta junk.fasta > junks.fasta
>
> parallel --pipe --recstart ">" wc -c < junks.fasta
> ##################
>
> There are only 2 records, but it finds lots of extra records of size zero.

See: http://savannah.gnu.org/bugs/?34241

> Related to this, parallel seems to become slow when the records are much
> bigger than the block size.

That is likely: Parallel read a chunk of size block-size at a time. If
it cannot find a single record in that it will have to append yet
another block of the same size. This will give performance of O(n^2).
I do not see a way to avoid that. For performance reasons you should
set block size bigger than one record.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]