Looking for diff4

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Looking for diff4

From:	Stuart Ballard
Subject:	Looking for diff4
Date:	Fri, 21 Feb 2003 09:57:23 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021210 Debian/1.2.1-3

[I really wanted to send this to a diffutils discussion mailing list,but I couldn't find one; this is the only mailing list mentioned on thediffutils homepage. Therefore, I know this is probably off-topic - ifsomebody could point me to a better forum for this suggestion, I'd bemore than happy]


I've been thinking for a while that there's a need for a 'diff4' program
which is to diff3 as diff3 is to regular diff. The particular use case I
have in mind is repeated merges of a branched file in a source control
system. Example:

Trunk  -- t1 -- t2 ------------ t3
            \      \               \
Branch       '- b1 -(m1)- b2 -- b3 -(m2)- b4

t1, t2 and t3 are trunk versions of the file; b1 - b4 are branch
versions. m1 and m2 are merges from the trunk to the branch.

m1 is, of course, the classic use case for diff3. There is a common
ancestor file (t1) and two divergent branches (t2 and b1) which need
to be merged.

m2, on the other hand, cannot be done adequately with diff3. The reason
is that the 'common ancestor' file has already diverged - on the trunk
it's t2, and on the branch it's b2. You don't want to re-merge all the
way back to the only true common ancestor (t1) because any conflicts
there have already been resolved - why resolve them again?

So instead of the three files t1, t2 and b1, we now have four files,
t2, t3, b2 and b3 to merge. I believe it's possible that this merge
can be done more effectively than just 'diff t2 t3 | patch b3' in the
same way that diff3 t2 t1 b1 can be done more effective than
'diff t1 t2 | patch b1'. At least I hope it can.

Unfortunately I'm not familiar with how the diff3 algorithm actually
works (I've read the manpages in which the diff and patch algorithms
are roughly described, but haven't seen anything equivalent for diff3 -
anyone have any pointers?) but I can think of two 'obvious' possible
approaches to diff3's problem space:

1) Treat diff3 a b c as similar to diff b a | patch c, and just take
advantage of the symmetry between a and c to help resolving conflicts.
If this is the approach that diff3 takes then, as far as I can see, it
isn't possible to do better than diff|patch for the diff4 problem,
because the symmetry doesn't exist in that case (merging from trunk to
branch should give *different* results than merging from branch to
trunk).

2) Given diff3 a b c, calculate diff b a and diff b c, then combine the
two diffs (resolving conflicts at this point) and apply the resulting
patch to a. If diff3 takes this approach, I can see a potential way to
extend it into diff4, as follows (assuming we're dealing with [tb][23] as
in the example above):

- Calculate diff t2 t3 and diff b2 b3.
- Construct an in-memory data structure corresponding to the 'union' of
  t2 and b2. I'm thinking something like a context diff but with an
  infinite amount of context, so that all the common content is included
  as well as the different bits. With the capabilities of diff, it should
  be pretty trivial to build a data structure like this that makes it
  possible to take a line number in t2 and identify the corresponding
  line in b2, and vice versa (or indicate that there is no such
  corresponding line because that section has diverged).
- Now combine the two diffs, just as in diff3 as described above, except:
  - Remap the line numbers from the t2-t3 diff so that the combined diff
    applies to b2, not t2.
  - There's an extra kind of conflict that can't happen in diff3: the case
    where something from the t2-t3 diff applies to a section of the file
    that doesn't exist in b2. Allow manual resolution of problems like
    that, just like diff3 does. One possible resolution might be to try
    using the 'patch' algorithm to try to find an alternative place in
    the file to apply it.
- Finally, apply the combined diff to b2.

Any thoughts? Have I wasted a whole lot of time thinking about something
that can't actually be done? Has someone already done it? Is my
understanding of the diff3 algorithm so hopelessly wrong that all my
conclusions are off? Is anyone interested in pursuing this problem
at all?

Thanks for any feedback,
Stuart.

PS Oh, and is there a good reason why sdiff seems to be limited totwo-way diffs? It seems, in fact, to be only up to the standard of 'difffile1 file2 | patch file1', and not even able to provide an interactiveequivalent to the full functionality of patch (where the file beingpatched is different from the source of the diff). It seems to me thatinteractive merging is MOST useful in situations where conflicts cancome up - an sdiff-like interactive interface would be vastly superiorto manually handling .rej and .orig files. The same, only even more so,applies to diff3 and diff4 - interactivity is invaluable in those cases!


--
Stuart Ballard, Programmer
NetReach - Internet Solutions
(215) 283-2300, ext. 126
http://www.netreach.com/

[Prev in Thread]

Current Thread

[Next in Thread]

Looking for diff4, Stuart Ballard <=
- Re: Looking for diff4, Paul Eggert, 2003/02/22
  - Re: Looking for diff4, Stuart Ballard, 2003/02/22

Prev by Date: Re: Segmentation violation using recent CVS version of gprof -l onMinGW32 + fix
Next by Date: Re: GNU tar Compile Problems
Previous by thread: And manage all of your Lexmark printers on
Next by thread: Re: Looking for diff4
Index(es):
- Date
- Thread