Re: New optimisations for long raw strings in C++ Mode.

On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii <eliz@gnu.org> wrote:

Really? Then please tell me how is it that we the humans can detect
incorrect fontifications even when shown partial strings and comments?
We know that fontifications are incorrect, and where strings or
comments start or end immediately, just after a single glance. We
never need to go to BOB to find that out.

Serious question: is fontification intended to display text according to what the author probably intended, or according to how a compiler will process that text (leaving correctness to a more precise tool than font-lock, whether Semantic, tree-sitter, LSP, whatever)?

Because I can definitely write code that has some subtle issue that I will miss, and erroneously think should display one way but which would be processed in a different way. Should fontification show my likely intention (plus, and only for bonus points, possibly highlight the error that disconnects the likely intended from the actual parse), or should it display according to the way the tools will interpret it so the author will find errors that way?

When I use a dedicated IDE of recent vintage, it feels less like I am writing a stream of characters than filling in partially constructed objects representing the abstract syntax of the language I'm writing in (with grammar that has allowances for incomplete or erroneous constructs), with the text being displayed as a representation of the underlying object. IOW, the relationship of the syntactic object and the text is inverted compared to emacs's design, where (if I understand correctly) the properties of the syntactic object are only tied to the text through text properties. With the other approach, the fontification and the syntax object are tied together, but with emacs the relationship seems much more tenuous. E.g. completion and fontification are completely separate activities as far as I know, though the same contextual information should be useful for both activities.

I have this CC-mode derived mode for a DSL I did not design. I'm currently the sole user of the mode, so I just wanted something quick and dirty. But as the pile of code I deal with in this DSL grows, I want to put in Semantic support for it to get context-aware completion, precise fontification, etc. The current discussion has made me wonder if deriving from CC mode is having some non-obvious effects on how font-lock works, making it non-local in ways that are not necessary, so the re-entrant nature of the Semantic parsers won't cure some of the slowness. For example, I want to use the font-lock of that mode in the REPL to fontify the statements/expressions I enter at the prompt, but otherwise ignore text. Particularly, at the end and the beginning of the REPL buffer. I don't want to narrow the buffer, just the area fontification applies to. Fontifying hundreds of megabytes of tracing print statements is not just unnecessary, it's bad news for the GC even after the buffer is cleared IME.

If CC mode is determining more syntactic information than tree-sitter's incremental parsing provides (per Immanuel Lizroth's comment in this thread), then there is a disconnect somewhere in the scope of expectations for what font-lock is supposed to do. I'm certainly not clear (yet) on how to cleanly separate and then rejoin a proper syntactic analysis with fontification, and if there is "an Emacs way" to do it.

Lynn

From:	Lynn Winebarger
Subject:	Re: New optimisations for long raw strings in C++ Mode.
Date:	Fri, 12 Aug 2022 09:05:06 -0400