[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Backuping up a whole filesystem?

From: David Stanaway
Subject: Re: [Duplicity-talk] Backuping up a whole filesystem?
Date: Thu, 13 Aug 2009 20:39:09 -0500
User-agent: Thunderbird (Windows/20090605)

We could re-invent the commercial wheel (Look up deduplication there are a few approaches out there), and they had different ways to slice and dice the data in an efficient way to have a global pool of compressed bits of files and a hash map with file meta data and pointers to its components.

I could run some tests and find out to see how rdiff scales for pulling out the commonality between large (For small definitions of large - backing up small VPS systems email, config, web and logs for instance) solid archives (Like a tar or fsdump with no compression).

In my case, I want to have compressed secure offsite backups of 1-2 GB of Cyrus data and 500MB of website data and logs (Mostly images and very low change rate.

My main contraint is bandwidth.

EG, what I would like to do is take e2fs dumps from lvm snapshots of var, home and root to a stream (So as not to need all that extra space) where a process takes and records the checksum (md5 will be fine) and rdiffs it with the last full that I have offsite, and rotate the retention so that the last full is patched with the last diff against it, and checksum checked then the diffs in between and previous full drop off the rotation.

So in theory, I would need the machine backed up, sufficient temp space to do the rdiff of a 2GB fsdump, and of course safety margin. On remote system, sufficient space to hold 2 compressed full images of the the system, + the compressed delta * the days between Full consolidations * 2, + overhead for running rdiff of a full image and patching a full image.

Paul Harris wrote:
2009/8/9 Gabriel Ambuehl <address@hidden <mailto:address@hidden>>

    On 9.8.09 David Stanaway wrote:
    > EG: I have a logfile which gets rotated to logfile.1 - that is
    the same
    > as logile in the previous backup, I don't need to send it again.
    > EG: I have some family pics that got emailed to me in my Family
    > I fwd the email to someone else.  The mimeenc attachment data is the
    > same. I haven't tested this, but I would think you had a solid
    > file (tar or fs dump) thhat this kind of duplications of data
    would drop
    > out.

    I would assume that these would get compressed away but only if
    you had a
    really giant compression dictionary?

<wild half-baked idea>

fingerprint all the files, and then when it comes to storage of the file, you only store the same fingerprinted file once.

so if you have 5 copies of a file, or the file moves around, then its only backed up once.

as for log files, that could be dealt with nicer if (eg) the fingerprints were done in chunks. that way the first half of a log file would only be backed up once.

</wild half-baked idea>

first one would be correctly checking for hash-collisions, so two different chunks of data that coincidentally share the same fingerprint don't only get half backed up

Duplicity-talk mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]