duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Feature request/discussion: Store identical files o


From: Alexander Boström
Subject: Re: [Duplicity-talk] Feature request/discussion: Store identical files only once
Date: Tue, 24 Jun 2008 16:55:13 +0200

mån 2008-06-23 klockan 23:11 -0700 skrev Chris Knight:
> 
> Basically, what I'm suggesting is rather than requesting that this
> functionality be added to duplicity, this could be built around
> duplicity with a little scripting.

Hi!

I have limited experience with Duplicity but I've considered this
problem and thought about a way to maybe do it.

So, how about this:

Add an option to specify, during full backups, the URL of an extra
"common public files" server. This would be a server which stores:

 A bunch of files, each with a checksum and an ID.

 A table on the form (checksum, ID, path).

(checksum, ID) could uniquely identify a file content. The ID could
always be 0 unless there's a checksum collission. There could be several
rows each with a different path but the same checksum and ID and also
several rows with the same path but different file content.

On this server, you would publish files which you think would benefit
from not being in every backup. You could populate it by doing, for
every OS you use:

 Install the OS on a computer, storing no secret files on it.

 find /usr /lib /bin -print0 | xargs -0 duplicity publish-common-files --url 
some://url

What Duplicity would do during a full backup is to first look up every
file it's about to backup on this extra server using the file's path and
checksum and then if it's available there, only store a reference to the
file and the metadata in the full backup.

The point of using the path along with the checksum is to minimise the
risk of not backing up a file due to a (highly unlikely) checksum
collission. If a file with the right checksum but the wrong path is
available, it could still be used if extra caution is used to verify
that it's the same file. Or perhaps the system could just use the
filename instead of the whole path.

The common files server would receive a list of all files on the backup
client and their checksum. To prevent this, for privacy and security
reasons, the client could just download the whole index and do the
lookups locally.

Restore would require access to both the backups and the common files
server, but they might as well be the same server.

With such a scheme I hope one wouldn't need to care about choosing which
directories to back up. Just back up everything even if you have a large
amount of clients.

Do you think this could fit into Duplicity reasonably well? The idea is
to implement the common files server as extensions to the current
backends. For most backends the files and the table would just be a
bunch of files in a directory structure.

/abo






reply via email to

[Prev in Thread] Current Thread [Next in Thread]