bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Bug with files which have hardlinks


From: Tim Kientzle
Subject: Re: [Bug-tar] Bug with files which have hardlinks
Date: Sat, 9 Oct 2010 13:41:36 -0700

On Oct 9, 2010, at 12:19 AM, Bob Proulx wrote:
> 
>  $ tar cvvf x.tar afile afile afile

This is a somewhat strange request you're making.
I doubt many tar implementations have done anything to
optimize this specific case (de-duping the argument list
would be one strategy, although -C handling makes that
a bit more complex than it sounds).

> $ tar cvvf x.tar afile afile afile
>   -rw-rw-r-- bob/bob          32 2010-10-09 01:07 afile
>  -rw-rw-r-- bob/bob          32 2010-10-09 01:07 afile
>  -rw-rw-r-- bob/bob          32 2010-10-09 01:07 afile


>  $ tar cvvf x.tar afile afile afile
>  -rw-rw-r-- bob/bob          32 2010-10-09 01:07 afile
>  hrw-rw-r-- bob/bob           0 2010-10-09 01:07 afile link to afile
>  hrw-rw-r-- bob/bob           0 2010-10-09 01:07 afile link to afile

These are both "reasonable" answers to your request.
Neither is really wrong, so there's not really a bug here.
The latter version is smaller (a link entry generally takes
less space than a full copy of the file), but the former is easier
to restore.  Which one you see will depend heavily on
which tar implementation you're using (GNU tar is just one of
many) and how it optimizes detecting hard links.

The basic strategy used by tar implementations for archiving
hard links is to keep a table that maps dev/ino values to
file names and create a hard link entry in the archive when
the tar program sees something that's already in the table.
The most straightforward implementation of this strategy would
give you the output you listed second regardless of the number
of actual links on the file.

But such tables can get very large if you're using tar to
backup a very large filesystem (a system with a billion
files could easily require hundreds of gigabytes to store
every filename).  So tar implementations generally skip
adding something to this table when the link count reported
by the filesystem is 1.

In the cases above, this explains the difference you're
seeing.  In the first case, nothing was entered into the 
internal table, so the tar program saw each "afile" as a
separate file to be archived.  Bumping the link count
caused an entry to occur in the internal table and resulted
in links being generated.

You might also have seen the second "afile" get recorded
as a link and the third be stored normally if the tar program
had taken the further optimization of removing the internal
table entry when the expected number of references had
been seen.

It's interesting to compare this with the strategies required
when writing formats such as the newer cpio variants (which
effectively store hard link entries first, and the "real" file
data last).

As Joerg pointed out, the more interesting problem is
how this gets handled on extract.  The second form is
tricky to restore correctly.

Cheers,

Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]