[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Update on tree-sitter structure navigation
From: |
Yuan Fu |
Subject: |
Update on tree-sitter structure navigation |
Date: |
Fri, 1 Sep 2023 22:01:54 -0700 |
Hey guys,
In the months after wrapping up tree-sitter stuff in emacs-29, I was thinking
about how to implement structural navigation and extracting information from
the parser with tree-sitter. In emacs-29 we have things like
treesit-beginning/end-of-defun, and treesit-defun-name. I was thinking maybe we
can generalize this to support getting arbitrary “thing” at point, move around
them, and getting information like the name of a defun, its arglist, parent of
a class, type of an variable declaration, etc, in a language-agnostic way.
Also, at the time, we only support defining things by a regexp matching a
node’s type, which is often not enough.
And it would be nice to somehow take advantage of the tree-sitter queries for
the features I mentioned above. Tree-sitter query is what every other editor
are using for virtually all tree-sitter related features. But in Emacs, we
mostly only use it for font-lock.
Here’s the progress as of now:
- Functions like treesit-search-forward, treesit-induce-sparse-tree,
treesit-thing-at-point, treesit--navigate-thing, etc, support a richer set of
predicates now. Besides regexp matching the type, the predicate can also be a
predication function, or (REGEP . FUNC), or compound predicates like (or PRED
PRED) or (not PRED).
- There’s now a variable treesit-thing-settings, which holds definition for
things. Then, instead of passing the predicate to the functions I mentioned
above, you can save the predicate in treesit-thing-settings under a symbol, say
‘sexp', and pass the symbol instead, just like thing-at-point.el. (We’ll work
on integrating with thing-at-point.el later.)
- I can’t think of a good way to integrate tree-sitter queries with the
navigation functions we have right now. Most importantly, tree-sitter query
always search top-down, and you can’t limit the depth it searches. OTOH, our
navigation functions work by traversing the tree node-to-node.
- There’s no progress on getting information like name and type, etc, in a
language-agnostic way. I haven’t come up with a good interface and/or
implementation. I encourage interested folks to give it some thought. Bonus
points for reusing the query files neovim folks has accumulated :-)
Some other things on the TODO list that people can take a jab at:
- Query-based indentation (neovim’s implementation can be a source of
inspiration)
- Improve c-ts-mode (indentation styles, other cc-mode features, etc) and other
tree-sitter modes
- Solve the grammar versioning/breaking-change problem: tree-sitter grammar
don’t have a version number, so every time the author changes the grammar, our
queries break, and loading the mode only produces a giant error.
- Major mode fallback/inheritance, this has been discussed many times, no good
solution emerged.
- Isolated ranges. For many embedded languages, each blocks should be
independent from another, but currently all the embedded blocks are connected
together and parsed by a single parser. We probably need to spawn a parser for
each block. I’ll probably work on this one next.
Finally, feel free to send me an email or send to emacs-devel and CC me, if
there are things treesit.c and treesit.el can do better, or when there are nice
things in neovim and other editors and Emacs ought to have, too.
Yuan