emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.50; Diff refinement


From: Stefan Monnier
Subject: Re: 23.0.50; Diff refinement
Date: Thu, 08 Nov 2007 10:30:53 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux)

>>> C-c C-b in the diff buffer produces this refinement:
>> 
>>> (foo abc (bar abc x))
>>> ^^^^^^   ^^^^  ^^^^^^
>> 
>> :-(
>> That sucks!
>> You can fix it by setting smerge-refine-weight-hack to nil.
> So it this just a bad side-effect, rather than a bug?

It's a bug alright, but it's not a simple coding bug: it's that the
assumptions made by smerge-refine-weight-hack about how `diff' works
aren't true.  And I can't think of any way to fix it right now.

The worst part is that this assumption seems to be correct most of
the time.

> It also goes away when using
>     (setq smerge-refine-ignore-whitespace nil)

But it's not clear that it wouldn't come back on a different example.

> or
>     -d  --minimal  Try hard to find a smaller set of changes.
> (In the latter case, it finds the second "abc" in the line.)

Hmm... that's interesting.  This might have a good chance to ensure the
hypothesis is correct.

For those who want to know:

The refined highlighting works by cutting a region into words.  So

   (foo bar)

is cut into

   (
   foo
    
   bar
   )

and then the other side is cut similarly and the result is passed diff.

smerge-refine-weight-hack changes the way the region is cut into words.
What it does is that the resulting file that is passed to diff has the
property that it has as many lines as the region has characters, i.e. for
the above example it cuts the region into:

   (
   foo
   foo
   foo
    
   bar
   bar
   bar
   )

this has 2 advantages:
1- it's easy to take diff's output (which has line numbers) and map it back
   into char-positions in the original region.
2- if diff tries to minimize the number of lines changed (which it appears
   is what it does) rather than the number of bytes changed, then the simple
   "cut into words" tends to give too much weight to spaces and punctuation.
   smerge-refine-weight-hack counter-acts this.

The main assumption made by smerge-refine-weight-hack is that if one of the
three lines of "foo" in the above example appears in a change, then the
other two will appear there as well.  This makes sense: if it's in a change,
that means the other file didn't have "foo" there but something else, so the
other "foo" will also fail to match the other file.

But in your example, we pass the following to diff:

   ..
   ..

   abc
   abc
   abc
    
   ..
   ..

   abc
   abc
   abc

   ..
   ..

and instead of diff saying that the following is added:

   abc
   abc
   abc
    
   ..
   ..

it says that the following is added one line further:

   abc
    
   ..
   ..

   abc
   abc

which is still correct because the place where this change is added looks like

   abc
   abc
   abc

so inserting the first thing before the first line or inserting the second
thing after the second line both result in the same output.

The thing is: both outputs are equally valid and equally small, so diff
can't know that one is preferable.  And really there are 4 possible equally
valid and equally good outputs.  But it's likely that "--minimal" will make
the search return either the "first" or the "last" one of those 4 (both of
those are fine for us, only the middle 2 are problematic).


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]