monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Text under revision control


From: hendrik
Subject: Re: [Monotone-devel] Re: Text under revision control
Date: Sat, 30 May 2009 11:05:35 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, May 29, 2009 at 12:23:21PM -0700, Zack Weinberg wrote:
> On Fri, May 29, 2009 at 12:00 PM,  <address@hidden> wrote:
> > On Fri, May 29, 2009 at 06:39:18PM +0000, Hendrik Boom wrote:
> >> So.  Monotone does appear to merge on a line-by-line basis.
> >>
> >> Too bad for OpenOffice's .fodt file type.
> >
> > Actually, byte-by-byte or word-by-word probably wouldn't be enough.  We'd 
> > need
> > something that can guarantee to produce valid XML that satisfies Open 
> > Document
> > Format syntax.
> 
> A three-way merge algorithm aware of not only XML but the ODF schema
> is definitely out of scope for monotone itself.

And possibly not easy.  I've seen a few XML merge algorithms.  The 
robust ones were too expensive to run on anything but baby examples.

> But I'd be happy to
> have a mechanism that could dispatch noninteractive three-way merges
> to external tools based on file attributes, file name extensions, or
> "magic numbers" in the file contents.  (And of course also dispatch
> *interactive* conflict resolution requests similarly.)

And the protocol for the plug-in mergers should allow them to say, 
"Sorry, it looked like an ODF file, but the contents are so incompatible 
with ODF that you're going to handle it some other way."  This could be 
an indication to use some less-specialized merge, or ask the user for 
advice.

> 
> I could argue either way on the question of whether the default
> algorithm should be byte- or word-oriented rather than line-oriented.
> It's the usual tradeoff between false conflict and false lack of
> conflict -- for our "core competency" of program source code, two
> changes that modify the same line could easily be a true conflict even
> if they are independent in terms of byte ranges modified.

For english ASCII text, word-oriented or byte-oriented might be better, 
since (except for poetry), exact line boundaries are not relevant 
(except when they're used for paragraph breaks).  Programs are much like 
poetry as far as layout-sensitivity is concerned.  I once met someone 
who had figured that out and started every line of his code with a 
capital letter. (in a case-insensitive language, of course.)

Perhaps this is a case for the file-type-dependent merge algorithm.

> On the
> other hand, I've heard plenty of "this is obviously not a conflict,
> why did the computer throw a conflict at me" complaints where the
> issue was that it wasn't going down to finer than lines, and Ediff's
> ability to do byte range comparison within a conflict is very handy.

I guess the other question is movement of blocks of text.  diff 
treats the interchange of blocks of text as a deletion of one block and 
subsequent insertion of the other block.  Now one of the ways to  
express a changed file in terms of an old one (possibly the one monotone 
uses to store changed files efficiently) does relate the blocks of the 
new file to those of the old file, because that's an efficient way to 
compress the new file (just say what to copy from the old file).  I'm 
wondering whether the process of merging chains of changes actually pays 
attention to that information.  Otherwise changes made to the deleted 
block would be deleted along with the block and not propagated to the 
newly inserted block.

       ABCCD
        / \
       /   \
      /     \
     /     AbCCD    where b fixed a small typo in B.
  ACCBD     /
     \     /
      \   /
       \ /
      ACCBD
   or ACCbD ?

I suppose this is another thing to try out.

I'm currently using an ad-hoc, very forgiving markup system for my 
document writing.  I can mechanically convert this to the much-less 
forgiving ODF or postscript when necessary.  I avoid inserting or 
deleting (really insignificant) newlines so as not to mess with 
line-oriented revision control.  But when my document extends over many 
files, and pieces get moved from one place to another, or even from one 
file to another, I suspect change-tracking and merging really starts to 
break down.

There is a gap in the revision-control marketplace for systems that 
handle word-processor formats.  Whatever first adequately fills this gap 
is likely to become entrenched.

-- hendrik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]