bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Interchange/performance issue with archive containing spar


From: Joerg Schilling
Subject: Re: [Bug-tar] Interchange/performance issue with archive containing sparse file
Date: Sun, 06 Feb 2011 22:56:23 +0100
User-agent: nail 11.22 3/20/05

Tim Kientzle <address@hidden> wrote:

> > 
> > If GNU tar archives sparse files, it creates archives that violate the 
> > POSIX structuring conventions for TAR archives. 
>
> The newer GNU tar --posix support addresses this, though
> it's not (yet?) the default format for GNU tar.  I think the
> current "1.0" variant is pretty well thought out (though I do have
> a couple of small quibbles. ;-)
>
> Libarchive now supports the GNU tar --posix "1.0" variant when
> writing sparse files.

I am not sure what you understand by posix version 1.0. The first GNU tar 
implementation that did move the hole description data into the POSIX extended 
headers created no problems because huge amounts of xheader data need to be 
allocated for parsing, but it was in conflict with the POSIX rules for xheaders 
as it _repeated_ line pairs like:

16 GNU.xxx.hole=123456
17 GNU.xxx.data=1234567

but the POSIX standard says that in case of releated entries, the last one is 
valid.

I asume that the current variant thus cannot be called "1.0". It is different 
and IIRC, it contains has a very long line of hole/data pairs. This is neither 
easy to read (star would need to malloc space for the maximum size of the 
xheader as the data is not block oriented), nor does it allow to archive larger 
sparse files. Note that the max. size of an xheader is 8 GB. Note that this 
would still not allow to have a 32 bit tar program to hadle the max. size, as a 
32 bit process cannot grow to even 4 GB.

A file with maximum sparseness thus currently only can grow up to aprox. 3 TB
until it is no longer archivable by GNU tar even with a 64 bit binary.

If I compare the currently available methods for handling the sparse data, the 
currently used method from star still seems to be the best.

-       The data is block oriented and thus can be read on the fly without a 
        need to malloc() sizeof ascii parse data

-       The base 256 format I introduced in the mid 1990s is smaller than 
        archiving the numbers as decimal strings.

-       The base 256 format still allows 95 bits for the file size wich is 
        sufficient for any local stgorage in a single universe, as this would
        take aprox. 1 MegaMol for active storage (net) mass if one bit takes
        one atom.

-       It is located in the file data space and thus unlimited in size.

I am not sure whether the current GNU tar sparse format will last for a longer 
time and this is why I am not sure whether I should implement support for it.


        
> > In future, the tar file format could be updated to allow sparse files to
> > be archived in a single pass, but it would require ...
>
> I've considered approaches like this for libarchive, but
> I haven't found the time to experiment with them.
>
> Specifically, this could be done without seeking
> (and without completely ignoring the standards)
> by recording a complete tar entry for each "packet" of a file

If the file is archived in chunks, it could be done without seeks.
BTW: Star implements a fifo since more than 20 years and because of this
fifo cannot be easily upgraded to support seeking.

Jörg

-- 
 EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
       address@hidden                (uni)  
       address@hidden (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



reply via email to

[Prev in Thread] Current Thread [Next in Thread]