bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Sparse file performance and suggestions


From: Joerg Schilling
Subject: Re: [Bug-tar] Sparse file performance and suggestions
Date: Sun, 06 Feb 2011 16:49:45 +0100
User-agent: nail 11.22 3/20/05

address@hidden wrote:

> Currently, tar seems to perform quite sub-optimally when archiving sparse
> files. I compared the performance of GNU tar and star when archiving a
> large (~2TB) sparse file. All but about 180MB of the file was holes.
>
> The archives created by star and GNU tar were of identical size, and both
> programs could extract each correctly. Extraction times were similar (both
> under 4 seconds). However, GNU tar took about 2.8 times as long as star to
> create the archive.

I asume that you are testing on a OS that does not implement 
SEEK_HOLE/SEEK_DATA, as star is even faster that GNU tar in case that the OS 
helps to retrieve the sparse file info.


> In future, the tar file format could be updated to allow sparse files to
> be archived in a single pass, but it would require that the archive be
> seekable. Or alternatively, tar would need a buffer at least as big as the
> largest non-hole region. (Extraction wouldn't need a seekable archive.)

The tar archive format is bases on the assumption that it is written to 
non-seekable media. You need to know the archive size for a file in order to 
write the tar header and you cannot know that before you did scan the holes.

For further thoughts on the archive format, it makes sense to rething how GNU 
tar currently archies sparse files. The format currently used by GNU tar takes
much more program data space than need and it is limited in the max. amount of 
holes that can be acrhived.

As the hole data os no longer block aligned, it cannot be read blockwise and 
needs a big malloc(). As the sparse meta data is stored inside the POSIX meta 
data area, it is limited to 8 GB. This limits the number og hole/data pairs and 
a file with maximum holyness may not be larger than a gew TB.

Jörg

-- 
 EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
       address@hidden                (uni)  
       address@hidden (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



reply via email to

[Prev in Thread] Current Thread [Next in Thread]