monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: RFC: Fake IDs


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: RFC: Fake IDs
Date: Tue, 18 Jul 2006 22:33:20 -0700
User-agent: Mutt/1.5.11+cvs20060403

On Tue, Jul 18, 2006 at 07:24:19PM -0700, Graydon Hoare wrote:
> Nathaniel Smith wrote:
> >If we hash some text, and then compare it to another equal-length
> >bitstring... the collision probability is not affected by whether that
> >other bitstrings was generated by pounding on the keyboard or by SHA1
> >of some other text.  Am I missing something in this analysis?
> 
> I'm not concerned with collision probabilities per-se. I'm concerned 
> with using collision probabilities as some proxy argument for what you 
> actually want in this case. A "fake ID" is supposed to be "an ID I 
> currently don't have in my database, or in the set of fake IDs I've 
> currently chosen during this run". Why hash strings and wonder about 
> probability when you can just check your database?

Well, as Zack mentioned, it is rather a pain to pass that database
handle around everywhere (and it would be nice to pass it less
places!).  Furthermore, there are times when you don't really mean
that anyway -- e.g., in the pluck code, I use 3 fake rids (for marking
the synthetic ancestor/child/workspace rosters), and there _are_ no
other rids in play, all I actually want is 3 rids that are distinct
from each other.  (Or maybe I actually use a real rid for one of
those, since I happen to have it handy... it doesn't really matter,
the point is that the constraint doesn't care about the rest of the
db.)

These are maybe too low-level details to bring into the discussion,
though.  I guess my real reason to dislike the idea of checking the db
is... well, an example is probably clearset.  When we go to store a
(file id, file text) pair in the database, we currently check to see
if we already have something stored under that file id, and if so,
just assume that the text has already been stored and exit early.  In
the past, people have very seriously suggested that what we ought to
do in this case is to instead, read out the text stored in the db, and
double-check that the two texts with the same hash are in fact
identical.

We do not do this because it creates code complexity and extra
resource usage, and has essentially non-existent gain -- if it were
effectively possible for those texts to be non-equal, then the entire
design would be horribly broken, not just our write_file_to_db
routine.

So for me... this is basically the same as the fake rid case, in that
in both we trade some simplicity and performance for more assurance in
case SHA1 turns out to suck.  I would feel a _lot_ more comfortable
with your argument if I saw some way I could agree with you, without
having to then also agree with the people who want to double-check
hashes all the time.  If that makes any sense...

-- Nathaniel

-- 
.i dei jitfa fanmo xatra




reply via email to

[Prev in Thread] Current Thread [Next in Thread]