parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel seems to drop


From: Dirk Eddelbuettel
Subject: Re: GNU Parallel seems to drop
Date: Tue, 25 Sep 2012 11:50:44 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Dirk Eddelbuettel <edd <at> debian.org> writes:
> Ole Tange <ole <at> tange.dk> writes:
> > If 2 awk scripts both open A, B and C then the last one wins and all
> > data written by the first one is lost.
> 
> Plonk. I think that may indeed be the case. I had not tought that through.
> I have to find a tool that does this in append mode.

Well a little "apt-get install gawk-doc" and two seconds of searching lead to 
the '>>' operator to append to files ... and tada, it now works.

edd@max:/tmp/parallel$ rm dataSerial/* dataParallel/*
edd@max:/tmp/parallel$ 
edd@max:/tmp/parallel$ cat data.txt | \
         awk -v path=dataSerial '{print $0 > (path "/" $1 ".txt")}'
edd@max:/tmp/parallel$ cat data.txt | \
         parallel --pipe -- awk -v path=dataParallel -f script.awk
edd@max:/tmp/parallel$ wc -l dataSerial/*
  199762 dataSerial/A.txt
  200031 dataSerial/B.txt
  200283 dataSerial/C.txt
  199845 dataSerial/D.txt
  200079 dataSerial/E.txt
 1000000 total
edd@max:/tmp/parallel$ wc -l dataParallel/*
  199762 dataParallel/A.txt
  200031 dataParallel/B.txt
  200283 dataParallel/C.txt
  199845 dataParallel/D.txt
  200079 dataParallel/E.txt
 1000000 total
edd@max:/tmp/parallel$ 

with 

edd@max:/tmp/parallel$ cat script.awk 
{ 
    print $0 >> (path "/" $1 ".txt")
}
edd@max:/tmp/parallel$ 
 

For reference and completeness, the data generator was the R script below:

edd@max:/tmp/parallel$ cat createData.r 
#!/usr/bin/Rscript

N <- 1e6
set.seed(42)

df <- data.frame(key=sample(LETTERS[1:5], N, replace=TRUE),
                 value=rnorm(N))

write.table(df, file="/tmp/parallel/data.txt", 
            row.names=FALSE, col.names=FALSE, quote=FALSE)


Thanks,  Dirk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]