[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A possible way for CC Mode to resolve its sluggishness

From: Alan Mackenzie
Subject: Re: A possible way for CC Mode to resolve its sluggishness
Date: Sat, 27 Apr 2019 13:57:25 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

Hello, Stefan.

On Fri, Apr 26, 2019 at 22:10:23 -0400, Stefan Monnier wrote:
> > The problem is that CC Mode's before/after-change-functions are very
> > general, and scan the buffer looking for situations which only arise
> > sporadically.  Things like an open string getting closed, or a >
> > being inserted which needs to be checked for a template delimiter.
> > However, these expensive checks are performed for _every_ buffer
> > change.  Even doing something like inserting a letter or a digit
> > causes the full range of tests to be performed.  This is not good.

> Part of the problem is that CC-mode is very eager in its management of
> syntax information: the `syntax-table` text-properties are always kept
> up-to-date over the whole buffer right after every single change.

That is not part of the problem.  That is part of the challenge.

> Modes using syntax-propertize work more lazily:
> before-change-functions only marks that some change occurred at
> position POS and the syntax-table properties after that position are
> only updated afterward on-demand.

Yes, but it is somewhat unclear whether, how, and when modes using
syntax-propertize can update syntax-table properties on positions
_before_ a change.  This is a prime reason for CC Mode not using this

> CC-mode tries to make up for it by being more clever about which parts
> of the buffer after position POS actually need to be updated, but when
> there are several consecutive changes, the extra work performed
> between each one of those changes add up quickly.

My proposal is to reduce this amount of work when it's not needed.

> [ Of course, there are cases where the approach used in
>   syntax-propertize loses big time.  E.g. if you have a loop that first
>   modifies a char near point-min, then asks for the syntax-table
>   properties near point-max, and then repeats... performance will suck.
>   But luckily I haven't yet seen a real-world use case where
>   this occurs.  ]

> Maybe another part of the problem is that CC-mode tries to do more than
> most other major modes: e.g. the highlighting of unclosed strings.
> For plain single-line strings this can be fairly cheap, but for
> multiline strings, keeping this information constantly up-to-date over
> the whole buffer can be costly.

CC Mode is successful in this regard.  The highlighting with
warning-face of unclosed string openers is a useful feature which other
modes could emulate.

I think I suggested a little while ago that this could be done in
syntactic analysis and font-lock.  We have a syntax flag saying "this
character (LF) terminates a style b comment", we could equally well have
a flag saying it terminates a string.  Then font-lock could examine the
string terminator, and use string-face or warning-face on the opener
depending on the terminating character.

But that's a digression from the topic of this thread.

> Most other major modes just let the font-lock-string-face bleeds further
> than the user intended, which requires much less work and works well
> enough for all other syntactic elements (CC-mode doesn't highlight
> unclosed parens, or mismatched parens, or `do` with missing `while`,
> ...).  When needed these many different kinds of errors are detected and
> shown to the user via things like flymake or LSP instead, which work
> much more lazily w.r.t buffer changes, so they don't need to same kind
> of engineering efforts to make them fast enough.

> > Thoughts?

> Not sure whether you intend this to be just a change to CC-mode (it does
> sound like it can all be implemented in Elisp) or you intend for some
> change at the C level.

At the Lisp level.  I hadn't even considered any C enhancements.

> My gut feeling is that the checks you suggest in (iii) could be
> implemented in Elisp without losing too much performance (they should
> spend most of their time within a few C primitives), tho it depends on
> the specifics of the cases you'll want to catch.  Also if you want to
> implement it in C those same specifics will need to be spelled out to
> figure out how a major mode will communicate them to the C code (for
> this to be useful beyond CC-mode, it would need to be very general, so
> it could be tricky to design).

> But to tell you the truth, other than CC-mode, I'm having a hard time
> imagining which other major mode will want to use such a thing.
> Performance of syntax-propertize is not stellar but doesn't seem
> problematic, and it is not too hard to use (its functioning is not
> exactly the same as what a real lexer would do, but you can make use of
> the language spec more or less straightforwardly), ....

Again, can syntax-propertize work on positions _before_ a buffer change?

> .... whereas I get the impression that your suggestion relies on
> properties of the language which are not often used, so are less
> familiar to the average mode implementor (and a language spec is
> unlikely to help you figure out what to do).

If other modes were to use the mechanism, they would need to define
their syntactic cell boundaries, as indeed I yet have to do for CC Mode.

> Maybe if we want to speed things up, we should consider a new parsing
> engine (instead of parse-partial-sexp and syntax-tables) based maybe on
> a DFA for the tokenizer and GLR parser on top.  That might arguably be
> more generally useful and easier to use (in the sense that one can more
> or less follow the language spec when implementing the major mode).

That would be a lot of design and a lot of work, and sounds like
something from the distant rather than medium future.  The indentation
and font-lock routines would have to be rewritten for each mode using

>         Stefan

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]