[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to add pseudo vector types

From: Ergus
Subject: Re: How to add pseudo vector types
Date: Tue, 27 Jul 2021 01:40:53 +0200

On Mon, Jul 26, 2021 at 12:40:31PM -0400, Yuan Fu wrote:

unless the narrowing is for multi-major-mode.

And what would you do in that case, if you allow TS to look beyond the

In the multi-major-mode case, there is a separate parser for each
language, and each sub-mode region in the text would get its own parser
tree (ie, it acts like a separate file), and that parser tree is only
told about changes to those regions. So the parser will never try to
look outside the region; it doesn't need to know about narrowing.

Once again, we are talking about the function used by TS to read
buffer text.  Not about the parser or its caller.  Low-level code,
which knows nothing about the context, should never look beyond the

It doesn’t harm for tree-sitter to see the rest of the buffer, it
doesn’t modify anything, all it does it reading the text. OTOH,
restricting tree-sitter to the bounds of narrows adds complexity for no
benefit (as far as I can see). Maybe narrowing is the context that low
level code should ignore, or at least tree-sitter should ignore. The
only benefit that I can think of is “we firmly adhere to the ‘contract’
that no one can look beyond the narrowed region”, but is it a good
contract? Is there really a contract in the first place? IMO, narrowing
acts like masking tapes over the rest of the buffer, so that user edits
like re-replace wouldn’t spill out. Demanding everything in Emacs to
not have access to the rest of the buffer is dogmatic (in the sense
that it is too rigid and is simply following the doctrine blindly).

Hi Yuan:

From my absolute ignorance on tree_sitter and your changes. There is a
function ts_parser_set_included_ranges that is a way I used once to
reduce the parsing region and improve (notably) the performance in a
test api.

Can't narrow regions use that? I think it is the same idea but I am
probably wrong.

Limiting the region to parse to the modified region (that in emacs may
be known thanks to the gap and maybe the undo-tree) and using the output
tree from the previous parse as the `old_tree` parameter in
ts_parser_parse_string made tree_sitter incredibly fast in my case (and
useful to run it on every key press).

In my case using old_tree reduced the time by a factor of 10 in a big
source file; and limiting the parser to the "changed" region only made
it almost instantly in more than 80% of the executions with small
modifications. (I repeat; it was a much simpler use case)

And about language definitions and font-locking, I just realized that
tree-sitter language definitions provides highlighting patterns, and we
only need to minimally modify them to use them for Emacs, so there
aren’t much manual effort involved.

I think tree-sitter has many more language definitions than Emacs in
some languages, and probably we may want to properly support them. So
maybe: instead of just modifying what is on tree-sitter to make it
similar to what emacs currently has; we could just use the node's
syntactic information and then let emacs use it adding more faces if
needed... Does it makes sense?

The idea is to have real syntactic information on the text itself
because that may help in the future to implement indentation and
navigation commands using three-sitter's information (commands like
up-list or forward-sexp) will be the equivalent to
ts_tree_cursor_goto_parent or ts_tree_cursor_goto_next_sibling.

Also, anyone have thoughts on how should tree-sitter intergrate with
font-lock beyond the current simple interface?

No idea, but in my experience the most efficient way to traverse a
tree-sitter tree is with ts_tree_cursor but maybe for font-lock the best
is just to use ts_tree_get_changed_ranges.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]