parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel seems to drop data


From: Dirk Eddelbuettel
Subject: Re: GNU Parallel seems to drop data
Date: Tue, 25 Sep 2012 13:48:10 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Ole Tange <ole <at> tange.dk> writes:
> On Tue, Sep 25, 2012 at 1:50 PM, Dirk Eddelbuettel <edd <at> debian.org> 
> wrote:
> 
> > Well a little "apt-get install gawk-doc" and two seconds of searching lead 
> > to
> > the '>>' operator to append to files ... and tada, it now works.
> 
> Depending on how it appends that may not work. Do you know for sure it
> flushes for every record? Otherwise you may get half-records.

Yes, now that I am in the office and my actual data, that verification in the 
next step. I probably also need the '-k' switch [ does that have "significant" 
performance implications? ] to ensure the order is the same which is important 
for the subsequent "munging" of the appropriately split files.

> If these give the same output, then you are golden. If not, you may
> have half-records in the parallel data.
> 
> parallel -k --tag 'sort {} | md5sum' ::: dataSerial/*
> parallel -k --tag 'sort {} | md5sum' ::: dataParallel/*

Brilliant idea to compare via md5sum.  Quicker than my formal munging. 

Dirk






reply via email to

[Prev in Thread] Current Thread [Next in Thread]