[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Processing files from a tar archive in parallel

From: Benjamin R. Haskell
Subject: Re: Processing files from a tar archive in parallel
Date: Tue, 29 Mar 2011 11:10:37 -0400 (EDT)
User-agent: Alpine 2.01 (LNX 1266 2009-07-14)

On Tue, 29 Mar 2011, Jay Hacker wrote:

I have a large gzipped tar archive containing many small files; just untarring it takes a lot of time and space. I'd like to be able to process each file in the archive, ideally without untarring the whole thing first, and I'd like to process several files in parallel. Is there a recipe for this with GNU Parallel?

Whether or not you're using Parallel, .tar.gz isn't a format that easily allows random access to the files it contains. You might be better off with something 7zip would handle. (Any archive format that has an index.) Then you could parallelize over the list of filenames contained.

Anything else (e.g. extracting the whole file sequentially, processing and deleting files as it goes) would probably require not-insignificant amounts of programming/scripting. (I'd gladly be proven wrong, here -- I've wanted to do things like this myself.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]