monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: [cdv-devel] more merging stuff (bit long...)


From: Bram Cohen
Subject: [Monotone-devel] Re: [cdv-devel] more merging stuff (bit long...)
Date: Sun, 7 Aug 2005 05:47:54 -0700 (PDT)

Nathaniel Smith wrote:

> I set myself a toy problem a few days ago: is there a really, truly,
> right way to merge two heads of an arbitrary DAG, when the object
> being merged is as simple as possible: a single scalar value?

The state of the art of version control isn't exactly what you'd call
advanced :-)

> Since the goal was to be "really, truly, right", I had to figure out
> what exactly that meant... basically, what I'm calling a "user model"
> -- a formal definition of how the user thinks about merging, to give
> an operational definition of "should conflict" and "should clean
> merge".  My rules are these:
>   1) whenever a user explicitly sets the value, they express a claim
>      that their setting is superior to the old setting
>   2) whenever a user chooses to commit a new revision, they implicitly
>      affirm the validity of the decisions that led to that revision's
>      parents
>     Corollary of (1) and (2): whenever a user explicitly sets the
>      value, they express that they consider their new setting to be
>      superior to _all_ old settings

This last one I believe to be incorrect in certain specific circumstances
(and they're the ones causing you grief in edge cases). Specifically, if a
user sets to a previously held value, then they're affirming the
superiority of that value to everything which was held between the last
time that value held and now, but *not* claiming that the current setting
of that value is superior to anything which overrode the old setting of
that value.

Specifically, in this case:

a
|\
b \
|  \
a   c

The result should be c, since the b is toasted by a revision to a. This is
in contrast to the following:

a
|\
b \
|  \
d   c

which is a conflict.

I call this behavior 'implicit undo'. It's very important behavior, since
end users generally revert changes by manually setting back to old
versions rather than explaining to the version control system what they're
doing.

The new codeville merge code I linked to recently contains code for doing
implicit undo on a single binary value (whether a line is included or
not). I *think* that approach can be generalized to scalar values by
running it over a tree multiple times for each value which has ever
occured, with the two values set to a/not a, b/not b, etc. If there's only
one value which should be present, it's set to that value, if there are
multiple ones, you have a conflict. I *think* it's impossible to have no
values set, but that's an open conjecture.

Unfortunately, I don't have a particularly clear intuitive justification
for my single binary value merge code, much less a formalization of it or
a real analysis of how it generalizes to multiple values. I tried to do
that and failed. If you get somewhere with it, please post your results!

As I mentioned previously, this results in one very bizarre case:

  a
 / \
/   \
b    b
|\  /|
a \/ a
| /\ |
|/  \|
b    b

On both sides we have one do and one implicit undo, so both sides merged
clean to 'do', but when you put them together there are undos of both dos,
so the result should clean merge to a, even though both ancestors are b.
The best way to treat this case is that there's only a single value merged
to, but a warning is escalated to the user that data may have become
inconsistent, since this value may not have appeared alongside other
values set to other things anywhere in the history.

That introduces a new funky warning case, but that warning represents the
reality of the situation, and unfortunately can't be simply glossed over.

> I'm not actually sure what the right semantics would be, though.  If
> we're merging:
>     |
>     a
>    / \
>   b   b
>    \ / \
>     b   c
> Should that be a clean merge?

No. (If it were, that would be 'convergent' behavior, by the way.)
Consider if the c were changed to an a - then the result would be b,
because the a is just an implicit undo to the old a. Clearly the b on the
left should carry some weight.

> Do you still have the same opinion if the graph is:
>     |
>     a
>     |
>     b
>    / \
>   c   b
>   |  / \
>   b  b  c
>   \ /
>    b

In this case the b overriding c was an undo, rather than a new value, so
it has null effect. The c should win. This is a case where having undos
makes the correct behavior clearer.

> Supporting these cases may irresistably lead back to ambiguous clean,
> as well:
>      |
>      a
>     / \
>    b*  c*
>   / \ / \
>  c*  X   b*
>   \ / \ /
>    c   b

This should be a conflict. Analyzing it from the point of view of c/not c
and b/not b shows that both of them win. Strangely, when analyzing that
way this case doesn't even stand out as being notable, despite being a
very funky edge case for other techniques.

> The other funky case is this thing (any clever name suggestions?):
>     a
>    / \
>   b*  c*
>    \ / \
>     c*  d*

d should clearly win this one.

> Merging here will give a conflict, with my algorithm; 3-way merge
> would resolve it cleanly.  Polling people on #monotone and #revctrl,
> the consensus seems to be that they agree with 3-way merge, but giving
> a conflict is really not _that_ bad.  (It also seems to cause some
> funky effects with darcs-merge; see zooko's comments on #revctrl and
> darcs-users.)

Got a link for that?

> I suspect the benefit of the proofs is obvious to anyone who has spent
> much time banging their head against this problem; until a few days
> ago I was skeptical there _was_ a way to design a merge algorithm that
> didn't run into problems like ambiguous clean merge.

Yeah, it's clear that old codeville merge has some good ideas but clear
deficiencies as well. Those deficiencies aren't in there because I'm lazy,
but because fixing those problems is *hard*. Being able to show that we've
stamped out all the funkiness would be very welcome.

> I'm still skeptical, of course, until people read this; merging is
> like crypto, you can't trust anything until everyone's tried to break
> it... so let's say I'm cautiously optimistic :-).  If this holds up,
> I'm quite happy; between the user model and the proofs, I'm far more
> confident that this does something sensible in all cases and has no
> lurking edge cases than I have been in any previous algorithm.  The
> few problem cases I know of display a pleasing conservatism -- perhaps
> more cautious than they need to be, but even if they do cause an
> occasional unnecessary conflict, once the conflict is resolved it
> should stay resolved.
>
> So... do your worst!

I think your proofs are good but your user model is flawed because it
doesn't include implicit undo. I propose that my suggested algorithm does
everything right, but it's complicated and I have to formal motivation for
it, much less fleshed out proofs.

Perhaps once this is all sorted out we can write a paper entitled 'How to
version a single value'.

-Bram





reply via email to

[Prev in Thread] Current Thread [Next in Thread]