[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unbalanced change hooks (part 2)

From: Alan Mackenzie
Subject: Re: Unbalanced change hooks (part 2)
Date: Sun, 31 Jul 2016 21:26:35 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

Hello, Eli.

On Sun, Jul 31, 2016 at 09:11:32PM +0300, Eli Zaretskii wrote:
> > Date: Sun, 31 Jul 2016 17:28:04 +0000
> > Cc: address@hidden, address@hidden, address@hidden
> > From: Alan Mackenzie <address@hidden>

> >     This variable holds a list of functions to call before any buffer
> >     modification.

> > I can't see any sensible interpretation of "any ... modification" other
> > than "each and every modification".  I've been relying on this for many
> > years.

> You are forgetting that "buffer modifications" is not rigorously
> defined in Emacs.  Let me remind you that in so many places we pretend
> that no changes were made to a buffer, while in fact there were.

I've no problem with that, as long as each action either definitely is a
change or definitely isn't a change.  In the former case, it should run
the change hooks, in the latter case it shouldn't.  It's the halfway
house where half of the change hooks are run that I don't like.

> Given this fact, "any modification" has no meaning, beyond the
> tautological "modifications that Emacs developers decided to treat as
> such".

> IOW, what the code does, and has been doing for many moons, is in this
> case a de-facto _definition_ of what should be done.

Extending that principle ad absurdum, Emacs has no bugs at all, since
what it does is what it should do.

> > Looked at another way, how can it be sensible to call just one of these
> > hooks?

> I gave one such interpretation.

OK.  I don't agree with your principle, and I'm not convinced you do
either.  :-)

> > Like, if you were redesigning Emacs from scratch, are there any
> > categories of buffer modifications in which you would call only
> > after-change-functions, and not before-c-f?

> But we are not redesigning Emacs.  At this stage, we would be nuts
> trying to.  Emacs, at the level of insdel.c, is stable, has been for a
> long time.  We should only make there localized changes in response to
> localized problems that are revealed at that level.  In the case in
> point, the problem is revealed at a much higher level, so solving it
> in insdel.c is simply wrong, because that will destabilize Emacs for
> years to come.  We cannot afford that, as the "Emacs cloud" is too
> large, and the number of people who understand these internals is too
> small.

I'm in favour of fixing what is wrong, rather than implementing
workarounds somewhere else.  Workarounds are rarely fully satisfactory.
I think you would agree with me that having before- and after- hook
calls rigorously paired would be the ideal state - you haven't attempted
so far to argue that missing out a before-change-functions call is a
good thing.

Solving the problem in insdel.c is the Right Thing to do, because that
is where the cause of the problem lies.  You assert that the change I
propose would "destabilise Emacs for years to come", but you haven't
provided any analysis of my proposed change to support that assertion.
The hackers who built Emacs were and are of a higher quality than to
build such instability into Emacs.

> > > My interpretation of these two variables is that they are two hooks
> > > provided for different kinds of features, because hooking into both
> > > should not generally be required.  It should suffice for an
> > > application that wants to react to a change in a buffer to hook into
> > > after-change-functions only.
> > No.  A program (such as CC Mode) reacts to a _change_, not merely to
> > what a buffer looks like after a change.  A buffer change loses
> > information, so in the general case, any required info needs to be
> > recorded by before-change-functions before it gets lost in the change,
> > so that it can be used by after-change-functions.

> And yet you had no problems until now.  This particular use case is
> not about some arbitrary buffer change, it is about replacing the
> whole buffer with an updated contents of the file.

No.  The code does fancy things comparing the file to be revisited with
the current buffer contents, and only loading the minimum (in some
sense) portion of the file, so as to preserve markers, or something like
that.  In Richard Copley's recipe for bug #24094, an essential part was
the C-x C-t to transpose two lines before C-x C-s.  The complexities of
the transposition got remembered when reloading the file portion,
leading to the bug.

> We don't need to consider any other use cases, because we are not
> aware of problems in those cases.

If we fixed things the way I would like, we'd automatically fix these
other use cases without any effort.

> > As a concrete example of this, suppose we have this in a buffer:
> > 
> >     #define foo() first line \
> >                   second line \
> >               third line \
> >               fourth line
> > 
> > , and the change consists of deleting the backslash on L2.
> > before-change-functions will have noted that all four lines are in the
> > "active area".  after-change-functions unassisted would merely see that
> > the "active area" consists of L1 and L2.  The info from before-change
> > enables after-change to operate on the entire four lines.

> You could simply discard the information for all the 4 lines, and be
> done.  It is more important to be correct than to be optimal.

It is for correctness, not speed, that the information is recorded.
Over the last decade or so, there were quite a few bugs where jit-lock
delayed fontification messed up what had been (for 0.5s) correct, or
where typing a CR in the middle of construct messed fontification up, or
....  Please believe me, this recording of info in c-before-change was
not done just for its own sake.

For example, if a C++ template delimiter, which is marked with a
parenthesis syntax-table property, is about to be deleted, we ensure in
c-before-change that the matching delimiter is appropriately handled.

I'm not saying that CC Mode couldn't be redesigned not to use
before-change-functions, but it would be a massive amount of work which
wouldn't get us any closer to where we want CC Mode to be, and it would
be depressing unrewarding work.

> But anyway, I believe the case in point has nothing to do with this
> example.  Right?

No, the example offers a justification of why CC Mode needs both before-
and after-change-functions.

> > > But even if I agree, for the sake of argument, with your
> > > interpretation, let's put this issue into a proper perspective, okay?
> > > We have a single major mode (albeit a very important one, one that I
> > > use every day) that under very specific conditions apparently fails to
> > > discard its caches, ....

> > Being precise, two critical variables were dirty store from the previous
> > invocation of c-after-change, not having been set in the missing call to
> > before-change-functions.

> I still don't understand why you couldn't clear all such variables in
> this case.  The entire buffer text has been replaced, why keep
> anything you had before?

The entire buffer hasn't been replaced (see above).  Those variables
get initialised in c-before-change.  If the call to c-before-change is
missing, those variables will be dirty store.

> > > .... because its design assumed calls to these two hooks will always
> > > be balanced.  As a solution, you are proposing a change in a very
> > > low-level and delicate part of Emacs, which means significant changes
> > > to code that was basically unchanged for more than 20, if not 30
> > > years.
> > prepare_to_modify buffer has been modified lots of times in recent
> > years.  (I count six modifications since 2014-01.)

> I said "basically unchanged".  Trivial refactoring doesn't count.  And
> any changes in this function were in response to issues common to all
> the callers.  The case in point isn't such a case.

What I am proposing is only a little more than trivial refactoring.

> > > This code is involved in almost every operation on buffer
> > > text, and so changes you propose will necessarily cause ripples all
> > > over Emacs.  Just look what the code you propose to change does:

> > >   . call barf-if-buffer-read-only
> > >   . set the buffer's redisplay flag
> > >   . affect the undo recording
> > >   . save the region when select-active-regions is non-nil
> > >   . sets deactivate-mark non-nil
> > I'm not proposing to change any of that at all.  What I'm proposing is
> > strictly limited, and could be accompished in two steps, as follows:
> > 1. (Pure refactoring).  Remove the call to signal_before_change from
> > prepare_to_modify_buffer.  Insert an explicit call to s_b_c after each
> > invocation of p_t_m_b.  Restore the check on inhibit_modification_hooks
> > in s_b_c.
> > 2. Where a call to signal_buffer_change is within an "if (prepare) ..."
> > construct, move it to after that construct, so that
> >     if (prepare)
> >       {
> >         prepare_to_modify_buffer (...);
> >     signal_after_change (...);
> >       }
> > would become
> >     if (prepare)
> >       prepare_to_modify_buffer (...);
> >     signal_after_change (...);

> See, that's the part that scares me.  You are proposing to run code
> where we didn't run it before.  Did you look at what
> signal_after_change does?  It doesn't just call
> after-change-functions, it does much more.

I'm proposing to run code designed to run before buffer changes before a
buffer change.  The code does a little more than call
before-change-functions but only a little:
(i) it runs first-change-hook;
(ii) it runs before-change-functions;
(iii) it runs report_overlay_modification (whatever that does).
(iv) it preserves *preserve_ptr as described in the comment.

I would be more worried that there are buffer modifications where
report_overlay_modification _doesn't_ get run.

> > > Can you imagine how much Lisp code and basic features will be affected
> > > by calling this where we never did before?  All that for the benefit
> > > of a single major mode that fails in a single, albeit important use
> > > case?

> > > I'm sorry, but that makes very little sense to me.
> > Please reconsider carefully the concrete proposal I've made (above) and
> > judge whether it will really affect very much at all.

> Your proposal doesn't take care of my fears.

I'm still trying to work out what they are, beyond a generalised
opposition to any change at this level of the code.

> > I can foresee that nothing will happen except the desired
> > invocations of before-change-functions being made.

> We will have to agree to disagree about this.

OK.  How about I implement what I've described, and we try it out?  If
it doesn't work, or it throws up non-trivial problems, it will be easy
enough to revert.

> > I'm not entirely sure that only CC Mode will be helped here.  The value
> > of before-change-functions in many major modes contains the function
> > syntax-ppss-flush-cache.  There is a good chance that some call of, say,
> > del_range_byte is going to leave the syntax-ppss cache stale and
> > invalid, a difficult situation to diagnose.

> I would have to see very specific evidence to that, not just
> theoretical possibilities.

Programs other than CC Mode use before-change-functions.  It would be
more effort to construct an actual failure than simply to fix the

> And even then, the proposed change touches much more than just
> before-change-functions.

> > > What does it do in each one of them, in the specific use case reported
> > > in the named bugs?

> > It should calculate the two variables c-new-BEG and c-new-END (which
> > bound the active region) in c-before-change, and these get modified in
> > c-after-change.

> Sorry, I don't understand.  Why can't you compute these two values in
> the c-after-change?

In the general case because information gets lost at a change; things
like where the beginning of a statement was, where deleted template
delimiters used to be, what the bounds of a macro or string used to be
(before backslashes were deleted).

Or are you suggesting I implement a special after-change handler for the
case when the before-change-functions was missing?  This would be
possible, but it would still be more work than ensuring that
before-change-functions actually gets called.

> And if you can't, why calling c-before-change
> when REPLACE is non-nil didn't fix that?  Once again, we are talking
> about reverting the entire buffer, so any intermediate deletions and
> insertions aren't important.

Again, the entire buffer wasn't replaced.  Read the comment in
Finsert_file_contents at ~L3697:

  /* If requested, replace the accessible part of the buffer
     with the file contents.  Avoid replacing text at the
     beginning or end of the buffer that matches the file contents;
     that preserves markers pointing to the unchanged parts.

> > In the bug situation, these variables simply held stale values from
> > the previous c-after-change call (which was from transpose-lines).
> > Specifically, c-after-change modified c-new-END by subtracting
> > old-len.  This happened to make c-new-END < c-new-BEG.  So a
> > subsequent re-search-forward invocation had the limit on the wrong
> > side of point and threw an error.

> Why can't this error be caught and interpreted as a sign that the
> cached values are wrong and need to be recomputed?

It's better (and less work) to determine them properly in the first
place.  Once we're in c-after-change, it's no longer possible to
calculate them properly in the general case.

> > > And why isn't it enough to make only the change you proposed in part 1
> > > of your report?

> > I tried that, but it didn't work.  With the help of gdb, it was clear
> > that the missing before-change-functions came from an invocation of
> > del_range_byte with `prepare' false, rather than directly from
> > Finsert_file_contents.

> So if that single call is with 'prepare' set to true (when both VISIT
> and REPLACE are non-nil), the problem would have been solved?

Yes, apart from all the other processing in prepare_to_modify_buffer
that the `prepare' flag being false is designed to avoid.

> > > 2) Can this problem be fixed in CC Mode itself, without touching
> > > either insert-file-contents or insdel.c functions?

> > I honestly don't know.  That would require extensive redesign if it's
> > possible.  That would need something like creating a new cache to
> > hold the information for the entire buffer which is currently determined
> > in c-before-change.  A simple workaround would be to ignore (i.e. not
> > process) an out of sequence invocation of c-after-change, but that's
> > hardly ideal.  I suspect I'm going to have to do that anyway in
> > standalone CC Mode.

> If you are going to have to do that anyway, why do we have to consider
> such low-level changes?

Because a proper fix is the Right Thing to do.  It's more satisfying,
works better, and eliminates other potential problems which haven't
shown up yet, or undiagnosed bugs caused by this which we haven't been
able to tie down.

> Btw, there's also a possibility for CC Mode to define its own
> revert-buffer function.  That function could be a thin wrapper around
> revert-buffer, and could do in the wrapping code whatever you think is
> missing in insert-file-contents, like forcefully calling
> before-change-functions.  That would still be better than messing with
> insdel.c, IMO.

Reverting the buffer is merely the recipe to reproduce the bug.  The bug
itself is c-before-change not being called.  It is likely to show up in
other circumstances if we don't fix it.

> > > 3) If the answer to 2) above is a categorical NO (but please provide
> > > enough arguments so we could be very sure it is that), then I _might_
> > > be convinced to consider changes to insert-file-contents only, and
> > > only for the case when both VISIT and REPLACE are non-nil.  That's
> > > because I think this combination is only used by revert-buffer and
> > > similar functionality (if someone knows about other callers that do
> > > this, please holler), ....
> > The actual operation was C-x C-f where the file was already visited in
> > Emacs, but had been modified outside of Emacs, so Finsert_file_contents
> > went through the code which attempts to preserve markers, etc.

> That's what revert-buffer does, it calls insert-file-contents with
> both VISIT and REPLACE non-nil.  And that's the use case we are
> talking about, right?

More or less.  C-x C-f also goes through these machinations in the case

> > > .... so the ripples will be not as wide as if we mess with insdel.c.
> > > (Still, even this possibility scares the living s**t out of me; it
> > > should do that to you as well.)  To this end, please tell which calls
> > > to del_range_byte in insert-file-contents are involved with this use
> > > case and cause trouble to CC Mode, and maybe we could change only
> > > those, in addition to the 'if' clause that guards the call to
> > > before-change-functions.

> > The del_range_byte call which I caught with gdb last night was, I think,
> > the one at fileio.c L4037.  I don't know the code well enough to say
> > that that's the only one which might trigger.

> If changing that one alone solves the problem, we don't need to
> consider any others.

As I said, why not let me implement what I've suggested?  If it doesn't
work well enough, I can revert it.

> Thanks.

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]