[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] High per file overhead?

From: Joerg Schilling
Subject: Re: [Bug-tar] High per file overhead?
Date: Sat, 25 Feb 2006 14:34:28 +0100
User-agent: nail 11.2 8/15/04

Phillip Susi <address@hidden> wrote:

> Can anyone explain this?
> ~$: du -bsh Maildir/
> 98M   Maildir/
> ~$: tar cf Maildir.tar Maildir/
> ~$: du -bsh Maildir.tar
> 112M  Maildir.tar
> ~$: find Maildir | cpio -o -H newc > Maildir.cpio
> 204433 blocks
> ~$: du -bsh Maildir.cpio
> 100M  Maildir.cpio
> Why does tar have 12M more overhead than cpio?  This Maildir is the lkml 
> since Jan 1, so it contains ~20,000 messages/files, but ~734 bytes per 
> file seems like a bit much for overhead.

As cpio does not offer a -H newc format, let me asume that you are talking   
about the -c or -H crc format...  
cpio is unblocked and thus has problems to resync after a part of the archive
that appears to be corrupted.  
du only counts the file contend and a part of the meta data (not counting e.g.
the "inode" - see: /usr/include/sys/fs/ufs_inode.h)
cpio -Hcrc writes 110 Bytes header + the file path name + the file content.
tar in the historical format or POSIX.1-1988 writes 512 bytes header + 
the file content rounded up to the next 512 byte boundary.
recent tar (POSIX.1-2001 aka. "pax") writes at least 1 KB per file in addition.
Conclusion: if you write more metadata, you have more overhead.  

But in real world use this has no relevence:

star -cPM -time f=/dev/null -C /usr .
star: 107825 blocks + 6656 bytes (total of 1104134656 bytes = 1078256.50k).
star: Total time 136.532sec (7897 kBytes/sec)

star -cPM -Hasc -time f=/dev/null -C /usr .
star: 104818 blocks + 2560 bytes (total of 1073338880 bytes = 1048182.50k).
star: Total time 134.415sec (7798 kBytes/sec)

The additional overhead that reasults from the tar format is typically less
than 3%. If you compress the result and use an archiver that takes care about
best compressibilty (as star does), even the small "advantage" of the cpio
format will go away.

If you compress the result, the remaining difference is less than 1%.


 EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
       address@hidden                (uni)  
       address@hidden     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

reply via email to

[Prev in Thread] Current Thread [Next in Thread]