emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Update on tree-sitter structure navigation


From: Yuan Fu
Subject: Update on tree-sitter structure navigation
Date: Fri, 1 Sep 2023 22:01:54 -0700

Hey guys,

In the months after wrapping up tree-sitter stuff in emacs-29, I was thinking 
about how to implement structural navigation and extracting information from 
the parser with tree-sitter. In emacs-29 we have things like 
treesit-beginning/end-of-defun, and treesit-defun-name. I was thinking maybe we 
can generalize this to support getting arbitrary “thing” at point, move around 
them, and getting information like the name of a defun, its arglist,  parent of 
a class, type of an variable declaration, etc, in a language-agnostic way.

Also, at the time, we only support defining things by a regexp matching a 
node’s type, which is often not enough. 

And it would be nice to somehow take advantage of the tree-sitter queries for 
the features I mentioned above. Tree-sitter query is what every other editor 
are using for virtually all tree-sitter related features. But in Emacs, we 
mostly only use it for font-lock.

Here’s the progress as of now:

- Functions like treesit-search-forward, treesit-induce-sparse-tree, 
treesit-thing-at-point, treesit--navigate-thing, etc, support a richer set of 
predicates now. Besides regexp matching the type, the predicate can also be a 
predication function, or (REGEP . FUNC), or compound predicates like (or PRED 
PRED) or (not PRED).

- There’s now a variable treesit-thing-settings, which holds definition for 
things. Then, instead of passing the predicate to the functions I mentioned 
above, you can save the predicate in treesit-thing-settings under a symbol, say 
‘sexp', and pass the symbol instead, just like thing-at-point.el. (We’ll work 
on integrating with thing-at-point.el later.)

- I can’t think of a good way to integrate tree-sitter queries with the 
navigation functions we have right now. Most importantly, tree-sitter query 
always search top-down, and you can’t limit the depth it searches. OTOH, our 
navigation functions work by traversing the tree node-to-node.

- There’s no progress on getting information like name and type, etc, in a 
language-agnostic way. I haven’t come up with a good interface and/or 
implementation. I encourage interested folks to give it some thought. Bonus 
points for reusing the query files neovim folks has accumulated :-)

Some other things on the TODO list that people can take a jab at:

- Query-based indentation (neovim’s implementation can be a source of 
inspiration)
- Improve c-ts-mode (indentation styles, other cc-mode features, etc) and other 
tree-sitter modes
- Solve the grammar versioning/breaking-change problem: tree-sitter grammar 
don’t have a version number, so every time the author changes the grammar, our 
queries break, and loading the mode only produces a giant error.
- Major mode fallback/inheritance, this has been discussed many times, no good 
solution emerged.
- Isolated ranges. For many embedded languages, each blocks should be 
independent from another, but currently all the embedded blocks are connected 
together and parsed by a single parser. We probably need to spawn a parser for 
each block. I’ll probably work on this one next.

Finally, feel free to send me an email or send to emacs-devel and CC me, if 
there are things treesit.c and treesit.el can do better, or when there are nice 
things in neovim and other editors and Emacs ought to have, too.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]