parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel seems to drop


From: Dirk Eddelbuettel
Subject: GNU Parallel seems to drop
Date: Tue, 25 Sep 2012 03:53:18 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Hi Ole,

I have some large jobs in which a file is piped into awk, and awk then splits
the large file into distinct files based on a token found on the line.

To make matters concrete, imagine a file

   A B foo C D
   E F foo G H
   I J giz K L
   M N foo O P
   Q R giz S T

where the 1st, 2nd and 4th line go to the file data/foo, and the 3rd and 5th to 
data/giz.  

I would like to parallelize this.  And instead of

  zcat foo.gz | awk -v v1=A v2=B -F: '....'

I tried (several variations, ending with)

  zcat foo.gz | parallel --pipe -- awk -v v1=A v2=B -F: -f script.awk

which should avoid most shell quoting headaches.  Unfortunately, parallel seems 
to swallow a lot of lines.  I started with approx 670 mb, and the parallel 
approach only yields about 3.  Ouch.  I am obviously doing something wrong 
here, 
but what is it?

I started with the current Debian package 20120422 and just tried the most 
recent release 20120822 which did not change things.

Thanks for writing and supporting parallel. It looks rather useful.

Cheers, Dirk

PS It would be nice if you also provided .info documents. I still like those as 
my go-to docs when in Emacs. I tried makeinfo on your .texi files, but there 
seems to be some metadata missing.


 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]