[Bug-tar] Solution to updating compressed archives: Pre-compress files

Hello,

First, thanks! I've been using GNU tar (and most of the other GNU lib) for almost 2 decades now (I'm 36). Even though GNU isn't unix, many of the best things about unix are really thanks to GNU.

Now...

Problem:

I frequently run into situations where I need to update archives. I of course also want to conserve space so I use compression. These two desires are not directly supported in the current version of tar.

I am clearly no alone here:

http://www.google.com/search?q=tar+update+compressed+archive

I understand the challenge is that it's really more that the compression programs that are not supporting updating, rather than the fault of tar. ...but that is only because we have boxed ourselves into a corner by assuming that the compression algorithm is something that we pipe the regular tar output through. There is another way that tar can leverage compression.

Solution:

The solution requires two parts of the code to be modified:

1) Compress each file before adding it to the archive.

2) Upgrade the tar section of meta about each file in the archive to provide storage space for specifying what compression algorithm/program is used for that file (if any).

Interface changes:

* There would need to be a new flag (-p, --pre_compress)

Cons

* The resulting files would not be quite as small as if the total archive were compressed.

* This is not a small code change.

Pros

* Tar could then support the full set of options for updating (adding, replacing, removing) individual files from the archive.

* Not all files need to be compressed. I frequently create back-ups of directories that contain compressed files. Tar could detect that files ending in .t?gz, or .bzip\d? are already compressed. Different files could be compressed using different algorithms.

I have a hackish Perl package/script that we use at Bee for creating archives that work this way. I also happen to know that at least a couple tech groups within IBM also have the convention/code for working with archives in this fashion.

I imagine that the toughest part would be the change to the headers. I would highly suggest moving to a named-element header format, so that the headers can be expanded upon later without much work.

Thanks,

-Carl

Carl Eklof
President @ Bee Software
address@hidden | p: 424.888.4BEE | f: 801.439.4213 | http://beesw.com/

From:	Carl Eklof
Subject:	[Bug-tar] Solution to updating compressed archives: Pre-compress files
Date:	Sat, 29 Oct 2011 10:04:26 -0700 (PDT)