rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: more info on 25gig files


From: Donovan Baarda
Subject: [rdiff-backup-users] Re: more info on 25gig files
Date: Wed, 06 Jul 2005 13:35:54 -0700

On Mon, 2005-07-04 at 08:03, address@hidden wrote:
[...]
> Speaking of variable seeds, the best way to implement it elegantly and
> without relying on a patched MD4 (or whatever) implementation is to use
> salt, i.e., by prepending the random string to the file data and running
> the usual hash.

In all the md4sum implementations I've seen, the struct used is not
opaque, so you can pre-seed by directly setting the A B C D values. I'm
not sure what is better...

> >>* "rdiff signature" on large cached file:  109MB/sec
> >>* OpenSSL MD5, 1KB blocks:                 246MB/sec
> >>               8KB blocks:                 326MB/sec
> >>* OpenSSL SHA1, 1KB blocks:                148MB/sec
> >>                8KB blocks:                174MB/sec
> > 
> > 
> > You are not
> > comparing apples with apples here; the rdiff signature includes rollsums
> > etc, and probably a crappy md4sum implementation.
> 
> My point exactly: it follows that switching to OpenSSL's MD5 won't slow
> things down compared to the current code.

Yeah, but switching to OpenSSL's md4 sum will make it faster, without
making it any weaker.

> > All the arguments for md5 or sha1 vs md4 revolve around extra protection
> > against malicious attacks that craft hash collisions. In the case of
> > librsync, the biggest threat from malicious attacks does not revolve
> > around vulnerabilities in md4sum, but from the fixed seed. Changing to a
> > variable seed will solve these, 
> 
> I don't think RC4 was ever evaluated in the context of variable seeds.
> But it is an extremely weak hash function, so I would not be surprised
> if it turns out to be susceptible even to that. If the cost in
> performance is negligible and compatibility is broken anyway, why not
> use something less broken?

I'm unconvinced that the cost in performance is negligible. I'm also not
convinced that it is "extremely weak", particularly in the context of
our application.

> > Though it someone can demonstrate that a different hash has a
> > significantly better distribution over random data than md4, then maybe
> > it would be worth considering (ie, it would avoid accidental collisions
> > better).
> 
> I find this unlikely.

The crucial point is that for librsync, the important part of the hash
used is it's distribution, not it's "strength" (BTW, for those wanting
to convince me otherwise, I think the two are actually related).

> > Also, the "metahash" hash of all the blocksums worries me... it might
> > not be very reliable.
> 
> Merkle hash trees, are quite similar and well-analyzed by the
> cryptographic community. In a Merkle hash tree you have a hash function
> compressing a 2n-bit string to an n-bit string, and you compute the
> n-bit hash of the whole file recursively: first you hash it into half
> the size by applying the hash to each aligned 2n-bit block, then to
> quarter of the size, etc. It has many wonderful uses. The only essential
> difference compared to our meta-hash is that the adversary can affect
> the partitioning of the raw data into chuns, but that shouldn't help him
> if he can't control the hash of those chunks. But if you want to be
> really calm in case of exotic cryptanalytic attacks, when computing the
> hash of a chunk represented by a packet, just include the chunk offset
> and chunk length in the input to the hash.

I wasn't that concerned about exotic cryptanalytic attacks, more about
random whole-file checksum collisions... particularly given the fact
that we are trying to use the whole-file checksum as a way of detecting
blocksum collisions... if the whole-file checksum is actually a checksum
of those blocksums...

I guess I'd need convincing that it's a reliable whole-file checksum,
and that the savings are worth it.

-- 
Donovan Baarda <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]