rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.


From: Greg Freemyer
Subject: Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.
Date: Wed, 28 Jan 2004 14:08:05 -0500

On Tue, 2004-01-27 at 21:33, Ben Escoto wrote:
> >>>>> Greg Freemyer <address@hidden>
> >>>>> wrote the following on Fri, 02 Jan 2004 14:39:18 -0500
> 
> > The writeup at:
> > 
> > http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html
> > 
> > in effect says that 'compare-by-hash' is not the perfect choice for
> > a rdiff-backup type of program.  Instead, it says some kind of state
> > information should be added to the algorithym to ensure accuracy.
> > 
> > I don't know that I am concerned, but I am curious if rdiff-backup
> > takes any precautions against hash-collisions.
> 
> No, rdiff-backup takes no special precautions.  As far as I can tell,
> the paper makes the point that even cryptographically strong hashes
> may be breakable in the future.  This is a pretty good objection,
> especially since librsync uses md5 instead of the stronger SHA1.
> 
> But in rdiff-backup the most an attacker could do would be to produce
> a file that had the same signature, causing rdiff-backup to skip over
> parts of the file.  But an attacker can already do much better than
> that: just use utime(), chmod, and such to make it look like your file
> hasn't changed.  Plus this method doesn't take years of research.
> 
> So anyway, rdiff-backup isn't intended to defeat an attacker trying to
> make it look like his files haven't changed.
> 
My impression was the write-up was talking about an inadvertant
hash-collision, not hackers.

Since the hashing process is lossy (ie. non-reversable), then it is
possible that two totally different data sets could generate the same
hash, and in turn invalidate the backup checksum check.

I don't know what the odds are of this happening with rdiff-backup.

I assume that they are exceedingly small, but not zero.

> 
> I did find this quote from the article interesting:
> 
> > BitKeeper keeps state about each file under source control and knows
> > what changes have been made since the last time each tree was
> > synchronized. When synchronizing, it sends only the differences
> > since the last synchronization occurred, in compressed form. In
> > comparison to rsync, BitKeeper provides similar and sometimes better
> > bandwidth usage when simply synchronizing two trees without
> > resorting to compare-by-hash.
> 
> Does anyone know how BitKeeper can send only the differences without
> using a compare-by-hash algorithm?  I don't see how this is possible
> unless BitKeeper has abandoned the usual edit w/arbitrary editor and
> then checkin scheme.
> 
I don't how bitkeeper does it, but I am familiar with drbd.  A network
based disk mirroring solution.

With drbd, if the network/cluster is up, then all writes are mirrored
immediately.

If the network is down, a dirty table is maintained.  Anytime a write is
done, the dirty table is flagged.

Then, when communication is re-established only the dirty blocks are
syncronized.

LVM snapshots also maintain a dirty block table (or Copy-On-Write
table).

It is obviously very non-trivial, but the ultimate way for a
rdiff-backup tool to work would be to somehow tie into a block level
dirty table.  The table would then be cleared when the block is backed
up.

As I said, LVM already has the concept of a dirty block table, so it is
not a totally crazy idea.  

Anybody need a doctorial thesis project?

Greg
-- 
Greg Freemyer





reply via email to

[Prev in Thread] Current Thread [Next in Thread]