rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.


From: Robert Knighten
Subject: Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.
Date: Wed, 28 Jan 2004 18:25:41 -0800

David S. writes:
 > > > 
 > > My impression was the write-up was talking about an inadvertant
 > > hash-collision, not hackers.
 > > 
 > > Since the hashing process is lossy (ie. non-reversable), then it is
 > > possible that two totally different data sets could generate the same
 > > hash, and in turn invalidate the backup checksum check.
 > > 
 > > I don't know what the odds are of this happening with rdiff-backup.
 > 
 > The presence or absence of collision is a property of the hash function
 > used - sha1, md5, whatever - and not 'rdiff-backup'.  Actually, any
 > hash function will have infinitely many collisions.  The proof is an
 > exercise for the reader ...
 > 
 > > 
 > > I assume that they are exceedingly small, but not zero.
 > 
 > Over the entire domain of a hash function, the probability is more
 > like one than zero.  Commonly used hashes are commonly used because they 
 > have infrequent collision over their practical domain.  That's an empirical
 > observation, however, not established, as far as I know, by any sort
 > of proof.
 > 

No, it is very small and typically easy to estimate.  Most hash functions are
designed to be close to uniform, i. e. the frequency of any particular hash
value is the same as all others.  The result is that for e.g. MD5 which
produces a 128-bit integer the probability that two messages chosen at random
will have the same hash is very close to 2^-128 ~ 3 * 10^-39.

-- 
Robert L. Knighten
address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]