Re: How to add pseudo vector types

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to add pseudo vector types

From:	Stephen Leake
Subject:	Re: How to add pseudo vector types
Date:	Tue, 20 Jul 2021 09:25:11 -0700
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt)
Eli Zaretskii <eliz@gnu.org> writes:

>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 15 Jul 2021 12:19:31 -0400
>> Cc: monnier@iro.umontreal.ca,
>>  emacs-devel@gnu.org
>> 
>> > Why do you need to do this when a buffer is updated? why not use
>> > display as the trigger?  Large portions of a buffer will never be
>> > displayed, and some buffers will not be displayed at all.  Why waste
>> > cycles on them?  Redisplay is perfectly equipped to tell you when some
>> > chunk of buffer text is going to be redrawn, and it already knows to
>> > do nothing if the buffer haven't changed.
>> 
>> Tree-sitter expects you to tell it every single change to the parsed text.
>
> That cannot be true, because the parsed text could be in a state where
> parsing it will fail.  

You can relax this to "when a parse is requested, tree-sitter must be
given the net changes to the text". You can combine several changes into
one, if that saves time or something.

But tree-sitter does have to deal with incorrect syntax.

> When you are in the middle of writing the code, this is what will
> happen many times, even if you pass the whole buffer to the parser.

Yes.

> And since tree-sitter _must_ be able to deal with this problem, it
> also must be able to receive incomplete parts of the buffer text, and
> do the best it can with it.

That does not follow.

I took that approach with ada-mode, and the results are not good. Mostly
this is because Ada requires always parsing from BOB, so parsing only
part of the buffer is bound to give bad results.

Knowing the changes from a previous complete parse allows the parser to
do a much better job.

>> Say you have a buffer with some content and scrolled through it, so
>> tree-sitter has parsed the whole buffer. Then some elisp edited some
>> text outside the visible portion. Redisplay doesn’t happen, we don’t
>> tell this edit to tree-sitter. Then I scroll to the place that has
>> been edited. What now?
>
> Now you call tree-sitter passing it the part of the buffer that needs
> to be parsed (e.g., the chunk that is about to be displayed).  If
> tree-sitter needs to look back, it will.

No, you pass tree-sitter the net list of changes since the last parse
was requested. Changes outside the visible region can easily affect the
visible region; consider inserting a comment or block start or end.

>> I’ve lost the change information, and tree-sitter’s tree is out-dated.
>
> No information is lost because the updated buffer text is available.

That is useful only if the previous buffer text is also available, so
you can diff it. It is more efficient to keep a list of changes.
Although if that list grows too large, it can be better to simply start
over, and parse the whole buffer again.

> In addition, Emacs records (for redisplay purposes) two places in each
> buffer related to changes: the minimum buffer position before which no
> changes were done since last redisplay, and the maximum buffer
> position beyond which there were no changes.  This can also be used to
> pass only a small part of the buffer to the parser, because the rest
> didn't change.

Again, the input to tree-sitter is a list of changes, not a block of
text containing changes.

That is because of the way incremental parsing works.

The list of changes to the buffer text are used to edit the parse tree,
deleting nodes that represent deleted or modified text, lexing the new
text to create new nodes.

Then the parser is run on the edited tree, _not_ on the buffer text. The
parser adds new nodes as appropriate to arrive at a complete parse tree.

There's no point in trying to tell the parser how much to parse; any
non-edited portion of the original text will be represented in the
edited tree by one or a small number of nodes; the parser then consumes
those quickly.

>> What we can do is to only parse the portion from BOB to the visible
>> portion. So we won’t parse the whole buffer unless you scroll to the
>> bottom.

You can stop parsing at the end of a complete grammar production; in
languages that require parsing from BOB, that is always EOB. The parser
cannot stop at an arbitrary point in the text; that would leave an
incomplete tree.

The point of incremental parsing is that parsing unchanged text is very
fast, because it is represented by a small number of nodes in the edited
tree.

> My primary worry is the fact that you want to use buffer-change hooks
> (and will soon enough want to use post-command-hook as well).  They
> slow down editing, sometimes tremendously, so I'd very much prefer not
> to use those hooks for fontification/parsing.  The original font-lock
> mechanism in Emacs 19 used these hooks; we switched to jit-lock and
> its redisplay-triggered fontifications because the original design had
> problems which couldn't be solved reliably and with reasonable
> performance.  I hope we will not make the mistake of going back to
> that sub-optimal design.

Ah. That could be a problem; incremental parsing fundamentally requires
a list of changes.

If the parser is in an Emacs module, so it has direct access to the
buffer, then the hooks only need to record the buffer positions of the
insertions and deletions, not the new text. That should be very fast.
Then the parse is only requested when the results are needed for
something, like indent or fontify.

That is how wisi works, except the parser is currently in an external
process, so the buffer change hooks also have to store the new text,
which can be large. Which is a good reason to improve wisi to support
the parser in a module.

In addition, the code that computes the requested information
(fontification or indentation) takes region bounds as input, and only
computes the information for that region (using the full parse tree);
that is much faster than always computing all information for the entire
buffer.

eglot, on the other hand, sends the change information to the LSP server
immediately (or after small delay), and then tries to do something with
the response, rather than waiting until some event triggers a need for
information from the server.

I'm guessing that font-lock ran the actual fontification functions from
the buffer-change hooks; that would be slow.

-- 
-- Stephe
[Prev in Thread]
Current Thread
[Next in Thread]
Re: How to add pseudo vector types, (continued)
Prev by Date: Re: How to add pseudo vector types
Next by Date: Re: How to add pseudo vector types
Previous by thread: Re: How to add pseudo vector types
Next by thread: Re: How to add pseudo vector types
Index(es):
- Date
- Thread