[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Processing files from a tar archive in parallel
From: |
Benjamin R. Haskell |
Subject: |
Re: Processing files from a tar archive in parallel |
Date: |
Tue, 29 Mar 2011 11:41:23 -0400 (EDT) |
User-agent: |
Alpine 2.01 (LNX 1266 2009-07-14) |
On Tue, 29 Mar 2011, Hans Schou wrote:
On Tue, 29 Mar 2011, Jay Hacker wrote:
I have a large gzipped tar archive containing many small files; just
untarring it takes a lot of time and space. I'd like to be able to
process each file in the archive, ideally without untarring the whole
thing first, and I'd like to process several files in parallel. Is
there a recipe for this with GNU Parallel?
tar xvf big-file.tar.gz | parallel echo "Proc this file {}"
Parallel will start when the first file is untared.
Wow. Glad I was (hopefully) so wrong. (I should point out that the
last time I wanted to do this, I'd not yet discovered Parallel.)
Hans, you left off the 'z' in 'tar zxvf':
tar zxvf big-file.tar.gz | parallel echo "Proc this file {}"
Jay, you probably also want to 'rm' the files as you go, since space
sounds like an issue.
And, unfortunately, it seems as though there's a timing issue with when
'tar' spits out the name... If the individual files are large, you
might have a job started before the file is fully there.
Tested via:
$ cd /tmp
$ mkdir /foo
# create 5 1-GB files
$ seq 1 5 | parallel dd if=/dev/zero of=foo/{} bs=1G count=1
$ tar -zcvf foo.tgz foo/*
$ rm -rf foo
$ tar -zxvf foo.tgz | parallel 'ls -l {} && rm {}'
parallel: Warning: Starting 10 extra processes takes > 2 sec.
Consider adjusting -j. Press CTRL-C to stop.
-rw------- 1 bhaskell users 1073741824 2011-03-29 11:28 foo/1
tar: foo/2: Cannot utime: No such file or directory
-rw------- 1 bhaskell users 4652032 2011-03-29 11:38 foo/2
- Re: Processing files from a tar archive in parallel, (continued)
- Re: Processing files from a tar archive in parallel, Hans Schou, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- RE: Processing files from a tar archive in parallel, Cook, Malcolm, 2011/03/29
- RE: Processing files from a tar archive in parallel, Cook, Malcolm, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- RE: Processing files from a tar archive in parallel, Cook, Malcolm, 2011/03/30
- Re: Processing files from a tar archive in parallel, Jay Hacker, 2011/03/30
- Re: Processing files from a tar archive in parallel, Hans Schou, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- Re: Processing files from a tar archive in parallel, Benjamin R. Haskell, 2011/03/30
Re: Processing files from a tar archive in parallel,
Benjamin R. Haskell <=
Re: Processing files from a tar archive in parallel, Benjamin R. Haskell, 2011/03/30