[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Processing files from a tar archive in parallel
From: |
Cook, Malcolm |
Subject: |
RE: Processing files from a tar archive in parallel |
Date: |
Tue, 29 Mar 2011 16:34:37 -0500 |
Hmmm
use tar-t to extract the filenames pipe that into parallel to call tar again to
extract just that file and pipe it to some other command
tar -t big-file.tar.gz | parallel tar -f big-file.tar.gz - '|'
someCommandThatReadsFromStdIn
Malcolm Cook
Stowers Institute for Medical Research - Bioinformatics
Kansas City, Missouri USA
> -----Original Message-----
> From: parallel-bounces+mec=stowers.org@gnu.org
> [mailto:parallel-bounces+mec=stowers.org@gnu.org] On Behalf
> Of Ole Tange
> Sent: Tuesday, March 29, 2011 4:14 PM
> To: Jay Hacker
> Cc: parallel@gnu.org
> Subject: Re: Processing files from a tar archive in parallel
>
> On Tue, Mar 29, 2011 at 10:14 PM, Jay Hacker
> <jayqhacker@gmail.com> wrote:
> > On Tue, Mar 29, 2011 at 11:20 AM, Hans Schou <chlor@schou.dk> wrote:
> >> On Tue, 29 Mar 2011, Jay Hacker wrote:
> >>
> >>> I have a large gzipped tar archive containing many small
> files; just
> >>> untarring it takes a lot of time and space. I'd like to
> be able to
> >>> process each file in the archive, ideally without untarring the
> >>> whole thing first,
> :
> >> tar xvf big-file.tar.gz | parallel echo "Proc this file {}"
> >>
> >> Parallel will start when the first file is untared.
> :
> > That is a great idea. However, can I be sure the file is
> completely
> > written to disk before tar prints the filename?
>
> While I loved Hans' idea, it does indeed have a race
> condition. This should run 'ls -l' on each file after
> decompressing and clearly fails now and then:
>
> $ tar xvf ../i.tgz | parallel ls -l > ls-l
> ls: cannot access 1792: No such file or directory
> ls: cannot access 209: No such file or directory
> ls: cannot access 21: No such file or directory
> ls: cannot access 2256: No such file or directory
> ls: cannot access 2349: No such file or directory
> ls: cannot access 2363: No such file or directory
> ls: cannot access 246: No such file or directory
> ls: cannot access 2712: No such file or directory
>
> But you could unpack in a new dir and use:
> http://www.gnu.org/software/parallel/man.html#example__gnu_par
> allel_as_dir_processor
>
> That seems to work.
>
> /Ole
>
>
- Processing files from a tar archive in parallel, Jay Hacker, 2011/03/29
- Re: Processing files from a tar archive in parallel, Hans Schou, 2011/03/29
- Re: Processing files from a tar archive in parallel, Jay Hacker, 2011/03/29
- Re: Processing files from a tar archive in parallel, Hans Schou, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- RE: Processing files from a tar archive in parallel,
Cook, Malcolm <=
- RE: Processing files from a tar archive in parallel, Cook, Malcolm, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- RE: Processing files from a tar archive in parallel, Cook, Malcolm, 2011/03/30
- Re: Processing files from a tar archive in parallel, Jay Hacker, 2011/03/30
- Re: Processing files from a tar archive in parallel, Hans Schou, 2011/03/29
- Re: Processing files from a tar archive in parallel, Ole Tange, 2011/03/29
- Re: Processing files from a tar archive in parallel, Benjamin R. Haskell, 2011/03/30
Re: Processing files from a tar archive in parallel, Benjamin R. Haskell, 2011/03/30