[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A possible way for CC Mode to resolve its sluggishness
From: |
Alan Mackenzie |
Subject: |
Re: A possible way for CC Mode to resolve its sluggishness |
Date: |
Sat, 27 Apr 2019 13:57:25 +0000 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
Hello, Stefan.
On Fri, Apr 26, 2019 at 22:10:23 -0400, Stefan Monnier wrote:
> > The problem is that CC Mode's before/after-change-functions are very
> > general, and scan the buffer looking for situations which only arise
> > sporadically. Things like an open string getting closed, or a >
> > being inserted which needs to be checked for a template delimiter.
> > However, these expensive checks are performed for _every_ buffer
> > change. Even doing something like inserting a letter or a digit
> > causes the full range of tests to be performed. This is not good.
> Part of the problem is that CC-mode is very eager in its management of
> syntax information: the `syntax-table` text-properties are always kept
> up-to-date over the whole buffer right after every single change.
That is not part of the problem. That is part of the challenge.
> Modes using syntax-propertize work more lazily:
> before-change-functions only marks that some change occurred at
> position POS and the syntax-table properties after that position are
> only updated afterward on-demand.
Yes, but it is somewhat unclear whether, how, and when modes using
syntax-propertize can update syntax-table properties on positions
_before_ a change. This is a prime reason for CC Mode not using this
strategy.
> CC-mode tries to make up for it by being more clever about which parts
> of the buffer after position POS actually need to be updated, but when
> there are several consecutive changes, the extra work performed
> between each one of those changes add up quickly.
My proposal is to reduce this amount of work when it's not needed.
> [ Of course, there are cases where the approach used in
> syntax-propertize loses big time. E.g. if you have a loop that first
> modifies a char near point-min, then asks for the syntax-table
> properties near point-max, and then repeats... performance will suck.
> But luckily I haven't yet seen a real-world use case where
> this occurs. ]
> Maybe another part of the problem is that CC-mode tries to do more than
> most other major modes: e.g. the highlighting of unclosed strings.
> For plain single-line strings this can be fairly cheap, but for
> multiline strings, keeping this information constantly up-to-date over
> the whole buffer can be costly.
CC Mode is successful in this regard. The highlighting with
warning-face of unclosed string openers is a useful feature which other
modes could emulate.
I think I suggested a little while ago that this could be done in
syntactic analysis and font-lock. We have a syntax flag saying "this
character (LF) terminates a style b comment", we could equally well have
a flag saying it terminates a string. Then font-lock could examine the
string terminator, and use string-face or warning-face on the opener
depending on the terminating character.
But that's a digression from the topic of this thread.
> Most other major modes just let the font-lock-string-face bleeds further
> than the user intended, which requires much less work and works well
> enough for all other syntactic elements (CC-mode doesn't highlight
> unclosed parens, or mismatched parens, or `do` with missing `while`,
> ...). When needed these many different kinds of errors are detected and
> shown to the user via things like flymake or LSP instead, which work
> much more lazily w.r.t buffer changes, so they don't need to same kind
> of engineering efforts to make them fast enough.
> > Thoughts?
> Not sure whether you intend this to be just a change to CC-mode (it does
> sound like it can all be implemented in Elisp) or you intend for some
> change at the C level.
At the Lisp level. I hadn't even considered any C enhancements.
> My gut feeling is that the checks you suggest in (iii) could be
> implemented in Elisp without losing too much performance (they should
> spend most of their time within a few C primitives), tho it depends on
> the specifics of the cases you'll want to catch. Also if you want to
> implement it in C those same specifics will need to be spelled out to
> figure out how a major mode will communicate them to the C code (for
> this to be useful beyond CC-mode, it would need to be very general, so
> it could be tricky to design).
> But to tell you the truth, other than CC-mode, I'm having a hard time
> imagining which other major mode will want to use such a thing.
> Performance of syntax-propertize is not stellar but doesn't seem
> problematic, and it is not too hard to use (its functioning is not
> exactly the same as what a real lexer would do, but you can make use of
> the language spec more or less straightforwardly), ....
Again, can syntax-propertize work on positions _before_ a buffer change?
> .... whereas I get the impression that your suggestion relies on
> properties of the language which are not often used, so are less
> familiar to the average mode implementor (and a language spec is
> unlikely to help you figure out what to do).
If other modes were to use the mechanism, they would need to define
their syntactic cell boundaries, as indeed I yet have to do for CC Mode.
> Maybe if we want to speed things up, we should consider a new parsing
> engine (instead of parse-partial-sexp and syntax-tables) based maybe on
> a DFA for the tokenizer and GLR parser on top. That might arguably be
> more generally useful and easier to use (in the sense that one can more
> or less follow the language spec when implementing the major mode).
That would be a lot of design and a lot of work, and sounds like
something from the distant rather than medium future. The indentation
and font-lock routines would have to be rewritten for each mode using
it.
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).