monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] README library list and log command fixes


From: Nathaniel Smith
Subject: Re: [Monotone-devel] README library list and log command fixes
Date: Tue, 21 Oct 2003 14:21:06 -0700
User-agent: Mutt/1.5.4i

On Mon, Oct 20, 2003 at 12:20:20AM -0400, graydon hoare wrote:
> Nathaniel Smith <address@hidden> writes:
> > Though, question: what does happen with the solution below if I have
> > 
> >  <stuff> -> A1 --> B1 -> B2
> >               \         /
> >                `-> A2 -'
> > 
> > and I commit B2 to my depot "B"?  we do a cert against B1, obviously,
> > and a cert against A2, obviously... but do we then follow A2's
> > ancestry up until we reach another revision in B?
> 
> nope. at least, not now. I don't think we should. do you?
> 
> currently, the least ancestor P of B2 which we find listed in B, we
> use as the base for the delta we post. P -> B2 is posted both as a new
> delta and as a new ancestry cert, generated by us. we also post
> (unmodified) copies of all certs found in our database on B2.
> 
> the change I'm considering, to address your previous concern, is to
> post all the ancestry certs (and possibly all the deltas) between P
> and B2. since the aggregation of edges can also be seen as a
> "feature", I'll probably put this under the control of a hook. I am
> not suggesting chasing any other incoming edges to B2. if B1 is in B,
> then P == B1 and that's as far back as we'll go.

Well, just compare to

 <stuff> -> A1 -----> B1 ----> B2
              \               /
               `-> B3 -> A2 -'

, I guess.  Presumably here we would want to post a B3 -> B2 cert.

> now when bob runs "monotone merge", LCA(C,D) will find A, not B, due
> to the recursive behavior I added to handle criss-cross merges. one
> thing leads to another!
> 
> how can this be resolved?
> 
>    1. bob can avoid fetching from his own depot. this will make it
>       less likely (but not impossible) that he ever encounters the A
>       -> C edge, or at least less likely that he encounters it before
>       he has to merge C and D.

I think someone, say jim, tracking both alice and bob's depots will
see both edges, no?  So bob might be all right, but jim will have
problems.

>    2. bob can ignore the situation since A is probably not a bad LCA
>       anyways, and monotone's merger will *likely* resolve the
>       repeated part of the merge3(A,D,C) gracefully (the A->B change
>       appearing twice).
> 
>    3. as you hint, we can try to teach the LCA algorithm to "ignore
>       these redundant edges". either by making the A -> C edge somehow
>       specially marked, or adding a secondary cert telling LCA to ignore
>       it, or by committing a local (non-posted) inhibitor, concurrently
>       with generating the A -> C edge for posting.

Marking is tricky, because of course these ancestry certs are supposed
to be used for ancestry; it's just that they're only supposed to be
used in the absence of more detailed information.  I don't think the
local inhibitor works for jim either.

>    4. as you are more broadly suggesting all through this email: never
>       synthesize the A -> C ancestry cert, at all. repost what you
>       have and forget about aggregation.

Yeah, I guess that is where I ended up arguing myself :-)

> > Should we be worried about the transitive trust issues here?  If I
> > automatically and uncritically create all these certs based on other
> > people's ancestry certs, is that bad? 
> 
> well.. perhaps a little. ultimately if someone is sending me code, and
> I'm incorporating it because I trust them, they have a way into the
> code I share with my friends anyways, whether done with certs, or
> simply code of theirs which I merged. but if it creeps people out
> synthesizing certs based on certs, we should turn it off by default.

The case I'm worried about is where you have someone, let's call them
"graydon", who is a core developer, and someone else, let's call them
"fly-by-night flo" who pops up, writes a single patch, and asks
graydon to review it.  flo commits her patch to a depot somewhere and
asks graydon to review it; graydon does so, decides that it is good,
and marks it "approved" and reposts to his depot.  (Or if flo was
posting to a communal depot, like a world-writeable NNTP group,
graydon just marks it approved and is done.)  Now everyone needs to
trust flo's key, even though flo has only ever submitted one patch,
and it was only accepted after being reviewed.

There are cases where this doesn't matter.  If graydon merge's flo's
patch in with propagate, for instance, then nobody has to trust flo's
key; if graydon implements approval by recert'ing flo's ancestry with
his key, then nobody has to trust flo's key.  Maybe this sort of thing
is sufficient.

> there are 2 problems which occur in the "reposting" strategy:
> 
>   - someone tracking my depot needs all the public keys of people I am
>     absorbing changes of, or else parts of the graph vanish.
> 
>   - I post more bytes, overall, since I'm not fusing edges.
> 
> since neither is a major problem, and more importantly neither is a
> *subtle* problem, I think that makes "reposting" desirable. I should
> probably also point out, in case these cases sound too painful to
> bear, that the "monotone propagate" command can fake something quite
> similar to the aggregation strategy, using different branch names to
> separate the aggregate stream from the reposted. that was its intent
> anyways. 

propagate is indeed quite useful here.

Some numbers:
  number of entries in top-level gcc ChangeLog, on HEAD branch: 1840
     total space to cert this many edges: ~500KiB
  total number of edges found by cvs_import in the gcc repository,
     after ~24 hours of cpu time: ~10,000 (I have no idea how much of an
                                          underestimate this is)
     total space to cert this many edges: ~3MiB
  size of a single checkout of gcc (and thus the minimal size of a
     depot: ~200MiB
  number of edges required to make reposting add 10% overhead: ~70,000

(This last number is something of a cause for concern on its own, I
guess; it'd be nice if I could put up a depot with some little patches
to gcc without uploading 200 meg, and forcing everyone else to
download that again just to get any of my little patches.  An argument
for eventually having smarter servers; an rsyncy protocol would solve
the download part of this nicely...)

-- Nathaniel

-- 
"...these, like all words, have single, decontextualized meanings: everyone
knows what each of these words means, everyone knows what constitutes an
instance of each of their referents.  Language is fixed.  Meaning is
certain.  Santa Claus comes down the chimney at midnight on December 24."
  -- The Language War, Robin Lakoff




reply via email to

[Prev in Thread] Current Thread [Next in Thread]