monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: branch review for net.venge.monotone.multihead


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: branch review for net.venge.monotone.multihead
Date: Fri, 7 Jul 2006 16:01:46 -0700
User-agent: Mutt/1.5.11+cvs20060403

On Fri, Jul 07, 2006 at 01:08:02AM -0700, Zack Weinberg wrote:
> On 7/7/06, Nathaniel Smith <address@hidden> wrote:
> >          rid_map_iter target
> >            = heads_for_ancestor.insert(std::make_pair(ancestor,
> >                                                       set<revision_id>()))
> >              .first;
> >
> >          I(target->first == ancestor);
> >
> >          target->second.insert(*i);
> >          target->second.insert(*j);
> >^^ this code seems like it would be clearer as [...]
> 
> I think you missed the point of having a set<> on the RHS.  It's
> precisely to deal with the case you called out at the end, where one
> parent has more than two ancestors.  I'm not sure I coded this bit
> right -- what I was trying to write was something like this:
> 
> if not ancestor in heads_for_ancestor: heads_for_ancestor[ancestor] = {}
> heads_for_ancestor[ancestor][i] = 1
> heads_for_ancestor[ancestor][j] = 1
> 
> Does that help?

Ah, I see.  That's, mm, clever :-).

> [...]
> >This overall structure of passes, each containing arbitrarily many
> >merges, seems unnecessarily baroque, and adds additional code paths
> >that may contain bugs.  (It is not at all obvious, for instance, what
> >happens when there are multiple candidate merge pairs that share the
> >same merge ancestor.)
> 
> It's _supposed_ to fall back to the old behavior in that situation...
> The inner loop there is in fact the old loop, word for word, and so
> _should_ be doing what it did, except on a restricted set of merge
> candidates.  Thus for instance
> 
> > Or what about:
> >
> >    A
> >   / \
> >  B   C
> > / \ / \
> >D   E   F
> >
> >Here it looks like it will merge (D, E) and (E, F) in one pass, which
> >is a bit silly; merging (D, E) and then the result of that with F, or
> >vice versa, would be better.
> 
> I'm 99% sure it does do merge (D,E) and then merge(m(D,E), F) or vice versa.

I think this is wrong.  You do:
  for each surviving CA:
    for each set of mergeable things under it:
      merge all those together

(D, E) and (E, F) have different CA's, so they get hit on different
passes through the outer loop, not the inner loop.

Now that I see what you're actually doing here, the user chatter
looks even more strange.  Say you have two CAs that survive, one of
which has 3 children and one of which has 2 parents.  The way the loop
is structured, and with the line:
            P(F("merge %d / %d") % count % ancestors.size());
we end up printing:

merge 1 / 2
merge 2 / 2
merge 1 / 2

That line is only correct if we assume that every ancestor corresponds
to exactly one merge we want to do.

> >I think it would be better to just go through the loop, each time
> >picking _some_ pair of heads that is as good as any to merge next,
> >merging them, and then going through again from the top.  Simpler to
> >reason about, and easier to describe to the user what's going on as
> >well (the second number in the current "merge %d / %d" message is
> >quite meaningless to a user).
> 
> I'm a little worried about the cost of recalculating all the tables
> and stuff on each iteration, though arguably if it's a problem, you
> have bigger problems.

Right.  The thing is, this logic only gets hit at all if you have at
least 3 heads, which is already the uncommon case.  The case where you
can avoid recalculating is even more rare; you need at least 4 heads,
and in a particular configuration.  Not only is this uncommon enough
that speed is unlikely to be a problem in practice, but this is also
exactly the case where users will forgive us -- if you have a horrible
pile of things to merge, you are perfectly happy to see your tool
tell you that it's going to figure out how to make this as painless as
possible, then spin for a few seconds doing so :-).

> Tangentially, did you see IRC backlog of my discussion with Graydon
> over the weekend about the possibility of trying all possible merge
> orders?  What do you think of that idea?

I did; don't have any strong opinion on it.  I have some vague
intuition that the simple approach here gives you 90% of the solution,
and a try-until-the-conflicts-go-away approach gives you a bias for
falsely clean merges -- though hopefully ones merger has a low enough
error rate that this is not that big a deal.

I also observe that there are a ridiculous number of merge orders for
reasonable sized head sets -- the number is larger than the n-1 Catalan
number[1], which already grows like 4^n.

n for us is generally small, though, so *shrug*.

[1] http://en.wikipedia.org/wiki/Catalan_number , and "larger" because
    we allow (some) permutations: ((ab)(cd)) and ((ad)(bc)) are both
    possible merge orders.  But it's not all permutations, and I'm too
    lazy to work out the combinatorics properly right now.

-- Nathaniel

-- 
"...these, like all words, have single, decontextualized meanings: everyone
knows what each of these words means, everyone knows what constitutes an
instance of each of their referents.  Language is fixed.  Meaning is
certain.  Santa Claus comes down the chimney at midnight on December 24."
  -- The Language War, Robin Lakoff




reply via email to

[Prev in Thread] Current Thread [Next in Thread]