[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports Truncated large records

From: Ole Tange
Subject: Re: GNU Parallel Bug Reports Truncated large records
Date: Tue, 24 Feb 2015 02:30:32 +0100

On Mon, Feb 23, 2015 at 2:28 PM, Johannes Dröge
<address@hidden> wrote:
> Hi Ole and GNU parallel devs,
> I'm processing large files (~50 GiB) with variable record sizes and have the 
> following issues:
> 1) The processing run-time of individual blocks is more than linear with the 
> input size. Therefore, it would be best if GNU parallel would allow pass 
> single records or a fixed number of records for each job, or at least would 
> not automatically increase the block size. Instead, the block size 
> auto-detection increases the block size on large individual blocks until only 
> very few processes are being run in parallel which then dominate the overall 
> run-time. This behavior strongly impacts the granularity of the parallel 
> execution.

I would recommend using -N, but it segfaults on you example data.

> 2) I'm seeing that large records (>2 GiB) are being truncated at 2 GiB and 
> thus passed incompletely via stdin.

I can reproduce this. The output is 2GB - 4 KB.

That is a serious problem. Fixed for your task in git version c445232:

  git clone git://git.savannah.gnu.org/parallel.git

The fix is not a general fix.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]