[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Looking for diff4
From: |
Stuart Ballard |
Subject: |
Looking for diff4 |
Date: |
Fri, 21 Feb 2003 09:57:23 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021210 Debian/1.2.1-3 |
[I really wanted to send this to a diffutils discussion mailing list,
but I couldn't find one; this is the only mailing list mentioned on the
diffutils homepage. Therefore, I know this is probably off-topic - if
somebody could point me to a better forum for this suggestion, I'd be
more than happy]
I've been thinking for a while that there's a need for a 'diff4' program
which is to diff3 as diff3 is to regular diff. The particular use case I
have in mind is repeated merges of a branched file in a source control
system. Example:
Trunk -- t1 -- t2 ------------ t3
\ \ \
Branch '- b1 -(m1)- b2 -- b3 -(m2)- b4
t1, t2 and t3 are trunk versions of the file; b1 - b4 are branch
versions. m1 and m2 are merges from the trunk to the branch.
m1 is, of course, the classic use case for diff3. There is a common
ancestor file (t1) and two divergent branches (t2 and b1) which need
to be merged.
m2, on the other hand, cannot be done adequately with diff3. The reason
is that the 'common ancestor' file has already diverged - on the trunk
it's t2, and on the branch it's b2. You don't want to re-merge all the
way back to the only true common ancestor (t1) because any conflicts
there have already been resolved - why resolve them again?
So instead of the three files t1, t2 and b1, we now have four files,
t2, t3, b2 and b3 to merge. I believe it's possible that this merge
can be done more effectively than just 'diff t2 t3 | patch b3' in the
same way that diff3 t2 t1 b1 can be done more effective than
'diff t1 t2 | patch b1'. At least I hope it can.
Unfortunately I'm not familiar with how the diff3 algorithm actually
works (I've read the manpages in which the diff and patch algorithms
are roughly described, but haven't seen anything equivalent for diff3 -
anyone have any pointers?) but I can think of two 'obvious' possible
approaches to diff3's problem space:
1) Treat diff3 a b c as similar to diff b a | patch c, and just take
advantage of the symmetry between a and c to help resolving conflicts.
If this is the approach that diff3 takes then, as far as I can see, it
isn't possible to do better than diff|patch for the diff4 problem,
because the symmetry doesn't exist in that case (merging from trunk to
branch should give *different* results than merging from branch to
trunk).
2) Given diff3 a b c, calculate diff b a and diff b c, then combine the
two diffs (resolving conflicts at this point) and apply the resulting
patch to a. If diff3 takes this approach, I can see a potential way to
extend it into diff4, as follows (assuming we're dealing with [tb][23] as
in the example above):
- Calculate diff t2 t3 and diff b2 b3.
- Construct an in-memory data structure corresponding to the 'union' of
t2 and b2. I'm thinking something like a context diff but with an
infinite amount of context, so that all the common content is included
as well as the different bits. With the capabilities of diff, it should
be pretty trivial to build a data structure like this that makes it
possible to take a line number in t2 and identify the corresponding
line in b2, and vice versa (or indicate that there is no such
corresponding line because that section has diverged).
- Now combine the two diffs, just as in diff3 as described above, except:
- Remap the line numbers from the t2-t3 diff so that the combined diff
applies to b2, not t2.
- There's an extra kind of conflict that can't happen in diff3: the case
where something from the t2-t3 diff applies to a section of the file
that doesn't exist in b2. Allow manual resolution of problems like
that, just like diff3 does. One possible resolution might be to try
using the 'patch' algorithm to try to find an alternative place in
the file to apply it.
- Finally, apply the combined diff to b2.
Any thoughts? Have I wasted a whole lot of time thinking about something
that can't actually be done? Has someone already done it? Is my
understanding of the diff3 algorithm so hopelessly wrong that all my
conclusions are off? Is anyone interested in pursuing this problem
at all?
Thanks for any feedback,
Stuart.
PS Oh, and is there a good reason why sdiff seems to be limited to
two-way diffs? It seems, in fact, to be only up to the standard of 'diff
file1 file2 | patch file1', and not even able to provide an interactive
equivalent to the full functionality of patch (where the file being
patched is different from the source of the diff). It seems to me that
interactive merging is MOST useful in situations where conflicts can
come up - an sdiff-like interactive interface would be vastly superior
to manually handling .rej and .orig files. The same, only even more so,
applies to diff3 and diff4 - interactivity is invaluable in those cases!
--
Stuart Ballard, Programmer
NetReach - Internet Solutions
(215) 283-2300, ext. 126
http://www.netreach.com/
- Looking for diff4,
Stuart Ballard <=