[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Merge3
From: |
Torsten Rueger |
Subject: |
[Monotone-devel] Merge3 |
Date: |
Wed, 21 Apr 2004 13:28:56 +0300 |
User-agent: |
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7b) Gecko/20040316 |
Moi,
I've been researching merge algorithms for xml recently and together
with a colleague have found an algorithm that merges text with very good
results.
I am afraid that I can't publish my code (it's ruby anyway), due to
licensing reasons. But I'm happy describing the algorithm and if someone
wants to implement it in c++, I can help (offline). I just think
monotone is such a fresh wind to the topic of version control, i'd be
happy to contribute at least a little to this merging topic.
In merging one can must take one of the two aproaches:
-- add/remove
-- add/copy
Quite seperate from that is the issue of line based versus byte based.
The second is obviously more general (works on binary files) but more
importantly also more acurate (produces less conflicts for changes on
one line that can still be merged)
Add/remove is what unix diff/patch does (on a line basis). Add/copy is
what more recent algorithms do, namely xdelta and vcdiff.
Now the problem with add/remove is that almost by definition is does not
detect copies or moves. So that when a part of text is moved in one
branch and edited in another, a conflict occurs. Often quite unneccessarily.
Xdelta and vcdiff not only work on binary data, but detect copies. They
do this for efficiency reasons (the produced delta is smaller) and only
for diffing 2 files. Unfortunately, both can only apply diffs to the
exact original, so no 3 way merging. So that's what I wrote and decribe
here.
I'll make an example (if you're still reading, great, keep going), the
numbers are for explaining the algorithm later. The example is short,
the strings are actually too short, but it's just an example.
The merge is perfectly feasable and unambiguous. You can put each word
on a line and run it through diff3 to see it fail.
Original: <html><body>Stuff</body><head>Merge Example</head></html>
1 2 3 4 5 6 7 8 9
One: <html><head>Merge Example</head><body>Stuff</body></html>
1 <--5 6 7 8 <--2 3 4 <--9
Two: <html><body>Algorithm</body><head>Example</head></html>
1 2 <--10 <--4 5 <-- 7 8 9
Merge: <html><head>Example</head><body>Algorithm</body></html>
1 5 7 8 2 10 4 9
Matching phase: Find the string in One and Two to map to the original.
Add added strings
One: 1 5-8 2-4 9
Two 1 5 7,8 2 add:10 4 9
Merging phase: go backwards through original following the change pattern:
Start with 9 in either
Go to 4, because of change in One
Go to 10 because of change in Two
Go to 2 because of change in Two
Go to 8 because of change in One
Go to 7 because no change in original order
Go to 5 because of change in Two
Go to 1 because of change in One
Output in reverse order and get Merge!
While going through the original matches of the matching phase, one has
to recognise the "changes" inside the matches. It's either that or
splitting all matches into the non overlapping pieces that are the
numbers. This second option has proven to be more complicated.
Matching is done al la xdelta, by splitting files into chunks,
calculating hashes for each chunk. Then looking for equal hashes and
expanding the match as far as possible. At the end one can add the
strings in the "gaps" that have not been matched.
I hope the example makes it clear that the algorithm is relatively easy
to understand (maybe more so than existing ones) and creates good merges
or avoids unneccessary conflicts.
So if you got this far, and are intested in implementing go for it. I'd
be happy to help.
Torsten
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Monotone-devel] Merge3,
Torsten Rueger <=