bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extensions to Tar


From: Paul Eggert
Subject: Re: Extensions to Tar
Date: Sat, 30 Mar 2002 22:25:08 -0800 (PST)

> From: Chris Wilson <address@hidden>
> Date: Sun, 31 Mar 2002 02:02:31 +0100 (BST)
> 
> I've implemented the "reserved" -y option for compressing individual files 
> in the archive using zlib (not gzip). It seems to be working okay, with a 
> few bugs which I am tracking down and fixing.

Yes, it is difficult to get right, isn't it?

>  Would you be interested in adding this code to tar? I am happy to
> share it under the GPL.

For an extensive change like that we'd need legal papers.  Have you done
this before with the FSF?  If not I can send you the forms.

> The main disadvantage is that tar currently requires zlib to build
> with this extension, although it could be protected with #ifdefs.

It would have to be buildable without zlib.  Not every platform has
zlib.  'configure' should have a --with-zlib=PATH option (see how
OpenSSH does it, for example).

> I'm also considering adding the ability to filter files through GPG, for 
> encrypted backups.

That might also be nice, though I think it's lower priority.

> One problem I encountered with filtering was that in order to write the 
> file header, tar needs to know _in advance_ the compressed/filtered size 
> of the data. This means either compressing the entire file in memory 
> before writing it, or compressing twice (once to get the size and write 
> the header, then again to write the compressed data). Can anyone see any
> ways around this problem?

I can't.

> (currently, I compress in memory, which limits 
> the maximum file size which can be compressed to the available virtual 
> memory).

You need to be able to fall back to plan B (compressing it twice);
otherwise the code won't be robust enough in practice.

> I have implemented support for these data types using custom values of the 
> typeflag header field: 'Y' for zlib-filtered data and 'E' for 
> GPG-encrypted data. I hope this is the "right" way to do it.

I would like future extensions to be compatible with POSIX 1003.1-2001
pax Interchange Format.  Please see:

http://www.UNIX-systems.org/version3/online.html

Registration is required, the terms are here:
http://www.opengroup.org/onlinepubs/terms.htm

Look at the 'pax' command for a description of the format.

> I would like tar to be able to save, restore and modify archives on a 
> remote system without using rmt, since it seems to be a big security hole. 
> I have untrusted clients who I want to be able to back up to a central 
> server without giving them rsh or ssh shell access or anything similar. 

With ssh you can give them only the ability to run a certain command,
presumably rmt or a wrapper for rmt.  Isn't that enough to give you
want you want?

> I would also like to be able to 'update' archives by writing a new file, 
> rather than modifying the existing one, to create incremental backups 
> without risking damage to the original archive. Would anyone like me to 
> implement this?

Not me.  :-)  You should be able to do that by telling tar to read
the archive from stdin and write it to stdout, updating as you go.
But as far as I know this part of the tar code has never been that
reliable.

> Finally, for differential/incremental backups, I was thinking about using 
> md5sums to detect changed files,

Why?  You don't trust the time stamps?

> Should I stick to using the mtime?

Should be the ctime.  But I'm lost as to why you'd want to use MD5
here.  And if you're paranoid enough to want a cryptographic checksum,
you should use a better one than MD5; SHA1 say.

Anyway, thanks for your thoughts.  I hope I haven't scared you away
from contributing....



reply via email to

[Prev in Thread] Current Thread [Next in Thread]