|
From: | Perry E. Metzger |
Subject: | Re: Tree Sitter (was Re: cc-mode fontification feels random) |
Date: | Wed, 21 Jul 2021 10:43:37 -0400 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.0 |
On 7/20/21 21:28, Stefan Monnier wrote:
Thought I would note that there's a substantial literature now on incremental parsing, especially the sort that is needed for editor tools. One doesn't need to reinvent the algorithms, they're out there waiting to be used. The Tree Sitter project is based on previous published work.Alternately, you can only store the length of each token (as tree-sitter does); then when processing queries, you have to add up all the lengths of the preceding tokens in order to report the buffer position of the information you are computing. That is also linear in the length of the buffer.You can probably get better than linear performance in "the usual case" by storing the total length of the subtree at each node of the AST. It's still theoretically linear in the worst case, of course.
There are good links at the end of the https://tree-sitter.github.io/tree-sitter/ web page, but I thought I'd link to some a couple of them directly:
Practical Algorithms for Incremental Software Development Environments https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf Incremental Analysis of Real Programming Languages https://www.semanticscholar.org/paper/Incremental-analysis-of-real-programming-languages-Wagner-Graham/ca69018c29cc415820ed207d7e1d391e2da1656f?p2dfThere's also this paper on error recovery for LR parsers, but for some reason it won't load for me right now.
http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf Perry
[Prev in Thread] | Current Thread | [Next in Thread] |