[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Incremental extract improvement

From: Helmut Waitzmann
Subject: Re: [Bug-tar] Incremental extract improvement
Date: Fri, 26 May 2006 11:20:42 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux)

Ian Turner <address@hidden> writes:

>On Tuesday 16 May 2006 18:16, Helmut Waitzmann wrote:
>> As tar stores all file names of a directory (regardless of whether the
>> files themselves are stored in the incremental archive or not), it
>> /knows/ which files are to be deleted when incremental extracting an
>> incremental archive:  It simply deletes all files which names are
>> /not/ in the incremental archive, thus truly applying the changes,
>Ah, but here's the rub: This step only truly applies the changes from the
>incremental archive, if there were no other unrelated files in the directory.
>Consider the following example:

May I insert some shell style comments in order to mark sequence points?

>$ mkdir foo
>$ touch foo/bar
>$ tar -cvf test.tar --listed-incremental=listing foo/

$ # Sequence point #1

>$ rm foo/bar
>$ touch foo/baz
>$ tar -cvf test2.tar --listed-incremental=listing foo/

$ # Sequence point #2

>$ touch foo/bat

$ # Sequence point #3

>$ tar -xvf test2.tar --listed-incremental foo/
>$ ls foo
>The point here is that the last execution of tar did /not/ truly apply the
>changes from the incremental archive; instead, it made a completely unrelated
>change by deleting the file foo/bat, which had never been mentioned before.
>IMHO it should be possible to extract an incremental archive in such a way as
>to truly get /only/ the changes from that archive,

Ah.  Now I understand your problem:  You would like to have incremental
restores to be a means of manipulating different file hierarchies in a
similar way, i.e. for example create an incremental archive, which, when
applied to different file hierarchies, removes "foo/bar" and creates or
modifies "foo/baz" and does nothing beyond that: a tape archive seen as a
file hierarchy manipulating function.

But that is not possible:  A gnu tar incremental tape archive does not
consist of file manipulating commands like "rm", "mv", "cp", "touch",
"chmod".  It simply has a list of all existing file names and the data of
all newly changed files.

It is not defined, what should happen, when extracting an incremental
archive without having extracted all incremental archives before til the
last full archive (see below).

>but I don't think the required information is really there.

Yes.  That information is not in the archive.  The intended usage of a
full archive followed by a series of incremental archives is to restore a
previous state of a file hierarchy, no matter what its current state is,
not to manipulate an arbitrary file hierarchy in a manner similar to the
manipulations that could be done in order to restore that file hierarchy
on which the archive had been created.

Let me comment about the sequence points above:

Sequence point #1:

Each file of the file hierarchy is archived in the full archive
"test.tar".  That is:  It does not matter whether you delete the whole
file hierarchy, fill it with arbitrary file names or make any changes you
like before you restore it by executing the command

$ tar -xvf test.tar --listed-incremental=/dev/null /foo

, which restores the file hierarchy to the state of sequence point #1 (as
long as gnu tar does not refrain from even removing file hierarchies).

Sequence point #2:

This state of the file hierarchy

(i.e. state #0 - "foo/bar" + "foo/baz")

is stored in the sequence of the archives "test.tar" and

It can be restored by executing the command

$ (  for archive in test.tar test2.tar
$    do
$       tar -xvf "$archive" --listed-incremental=/dev/null /foo || exit "$?"
$    done
$ )

Because "test2.tar" is an incremental archive, it serves no purpose
without "test.tar":  The effect of extracting it is not defined unless
the last preceding full archive ("test.tar") followed by all incremental
archives in between are extracted beforehand without making further
changes to the file hierarchy.

Sequence point #3:

This state of the file hierarchy

(i.e. state #0 - "foo/bar" + "foo/baz" + "foo/bat")

is not stored in an archive.  (There is no "test3.tar".)

As sequence point #3 was recorded neither in a full nor an incremental
archive, there is no chance to retrieve (or retain) that state by
extracting the archives "test.tar" and "test2.tar".


$ touch foo/bat
$ tar -xvf test2.tar --listed-incremental foo/

ist not a valid restoring command sequence (although in this special case
it does no harm), because the pre-condition of "test2.tar" is not
satisfied, whereas

$ touch foo/bat
$ (  for archive in test.tar test2.tar
$    do tar -xvf "$archive" --listed-incremental=/dev/null /foo || exit "$?"
$    done
$ )

is correct.

So, if "foo/bat" is to be retained, it has to be archived beforehand:

$ touch foo/bat
$ tar -cf test3.tar --listed-incremental=listing

If you create a full archive ("archive_0.tar") followed by a sequence of
N incremental archives ("archive_1.tar", "archive_2.tar", ...,
"archive_N.tar"), then the only valid extracting sequences are
("archive_0.tar", ..., "archive_n.tar") for all numbers n between 0 and
N, i.e. the extracting sequences are prefixes of the archiving sequence.

To get back to your problem:

>Ah, but here's the rub: This step only truly applies the changes from the
>incremental archive, if there were no other unrelated files in the directory.

If you want to restore an old file hierarchy state, then all files which
changed, appeared or disappeared after that hierarchy state are by no
means at all "unrelated files":  If tar didn't roll back, delete or
recreate them, resp., the state of the file hierarchy wouldn't be the
same as before.

I don't know, whether gnu-tar's "--exclude" and "--exclude-from" options
provide a means of "don't care" semantics when restoring a backup.  If I
remember correctly, this is not the case, but I may be wrong.

Gnu tar is not a file hierarchy manipulating function, which domain and
image are sets of file hierarchies, I'm sorry to say.  It's rather a pair
of sets:  One of them containing the filenames of the files that are not
to be deleted and the other containing the files to be restored.
Wenn Sie mir E-Mail schreiben, stellen  | When writing me e-mail, please
Sie bitte vor meine E-Mail-Adresse      | precede my e-mail address with
meinen Vor- und Nachnamen, etwa so:     | my full name, like
Helmut Waitzmann <address@hidden>, (Helmut Waitzmann) address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]