monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

3-way merge considered harmful (was Re: [Monotone-devel] merge weirdness


From: Nathaniel Smith
Subject: 3-way merge considered harmful (was Re: [Monotone-devel] merge weirdness...)
Date: Sun, 1 May 2005 00:29:48 -0700
User-agent: Mutt/1.5.9i

On Wed, Apr 27, 2005 at 03:54:14AM -0700, Nathaniel Smith wrote:
> For merging, monotone does _not_ use the closest common ancestor.  The
> reason is that doing so could silently corrupt your code.
> 
> The simplest example is the famous "criss-cross merge":

The last few days I've been thinking more about this and some other
pathological 3-way merge cases we've known about for a while,
re-considering and re-digesting in the light of the recent
#revctrl-inspired explosion of understanding of merge algorithms.
The rename bug Richard just found is something of a final straw.  I no
longer believe 3-way merge is a viable technique for modern VCSes.

To expand: I'll assume everyone knows about the criss-cross merge
case, which forces us to choose more distant ancestors in some cases,
or else risk silently corrupting code.  There are some limited
workarounds to this (per-file ancestor selection, for instance), but
they merely reduce the frequency of the problem, rather than eliminate
it.

Here's another pathological case for 3-way merge:
   A
   |
   B
  / \
 C   D
Suppose that a file was added on the A->B edge, and then removed again
in B->C, while it was left alone on the B->D edge.  We want to merge
C and D.  Let's assume that for some reason we decide to use A as a
common ancestor.  What will happen?  Our file does not exist in either
A or C, so when comparing A and C 3-way merge will think that nothing
has happened.  Our file _does_ exist in D, though, so when comparing A
and D, 3-way merge will decide that a file has been added.  Therefore,
says 3-way merge, this file should exist in the new merged node.
But, this is obviously wrong; the file was deleted between B and C,
and this delete should have caused a delete of D's copy as well.

This was the version originally discovered some time ago, and codified
in monotone's t_merge_add_del.at test.  There is a corresponding
version for content changes -- suppose that a file was modified in B,
and then this change was reverted in C, but left alone in D.  The same
problem occurs; the bad code mysteriously reappears in the merge of C
and D.

When we discovered this a while ago, we sort of scratched our heads,
tossed around some ideas about "per-hunk ancestor selection" and
such, and shrugged and ignored it for a while.  Richard just
discovered a related problem that is harder to ignore.  Take the same
graph as above, and suppose that instead of being deleted in C, our
file was _renamed_ in C.  Now, when we go to merge, the A->C changeset
says file "foo" was added, the A->D changeset says file "bar" was
added, and there is no way to tell that these files are really the
same file, so they are both added.  Now we have a single logical file
that exists twice in the same tree.  (This has now actually happened
in the monotone tree, which means that we have another ancestry
rebuild in our future.)  Notice the problem doesn't actually get much
better in a system where files are assigned unique persistent ids,
rather than having them calculated on the fly as monotone does (or if
monotone used a better way to calculate them on the fly); it would
make this case marginally better, in that 3-way merge would signal a
conflict instead of doing the incorrect thing, but this is not that
much of an improvement.  (And is exactly the sort of weird id-only
conflict that monotone _doesn't_ store peristent file ids so it can
_avoid_.)

(This problem seems specific to the semantics of "rename"; there
doesn't seem to an analogue for file content, because file content
cannot be moved, only created or destroyed.  If anyone wanted to
attempt supporting content movement despite its horrible edge cases
and inherent potential for misbehavior, though, then this is yet
another problem they would need some solution to.)

So, clearly, picking A as an ancestor in the above graph is terrible.

This leaves us in a bad situation, though.  The possibility of
criss-cross merge means that we are sometimes (or often) forced to
choose ancestors that are far away, or else risk corrupting user data;
the above cases mean that we are sometimes (or often?) forced to
choose ancestors that are _close_, or else risk corrupting user data.
I am fairly confident that one can construct more complicated graph
topologies embedding the above example, in which one is _forced_ by
criss-cross rules to choose A as an ancestor.  (Certainly it is
possible for monotone's current criss-cross avoidance algorithm; I
believe it is possible in general, but it's harder to check
as-yet-undiscovered criss-cross avoidance algorithms in general.
Per-file ancestors don't seem to help; criss-cross and mysterious
unreversion both can happen at the hunk level, and thus can both be
going on in the very same file at the very same time.)

Conclusion 1: While the thought of rewriting change_set.cc again fills
me with something almost, but not quite, entirely unlike joy, I don't
see any way to fix Richard's bug without some major surgery, and we
need to not generate nonsense changesets.  At which point, I guess, we
might as well implement:

Conclusion 2: 3-way merge is an inappropriate choice of merge
algorithm for a modern VCS.

Fortunately, it seems like codeville-merge is a viable replacement
here (it can be applied to both content merges and tree rearrangement
merges), and with some recent improvements that Bram and some other
people (including me) have been playing with, it may (I'm not sure
yet) be strictly more powerful than 3-way merge.

-- Nathaniel

-- 
"...these, like all words, have single, decontextualized meanings: everyone
knows what each of these words means, everyone knows what constitutes an
instance of each of their referents.  Language is fixed.  Meaning is
certain.  Santa Claus comes down the chimney at midnight on December 24."
  -- The Language War, Robin Lakoff




reply via email to

[Prev in Thread] Current Thread [Next in Thread]