[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree Sitter (was Re: cc-mode fontification feels random)

From: Daniel Colascione
Subject: Re: Tree Sitter (was Re: cc-mode fontification feels random)
Date: Wed, 21 Jul 2021 09:21:38 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/21/21 7:43 AM, Perry E. Metzger wrote:
On 7/20/21 21:28, Stefan Monnier wrote:
Alternately, you can only store the length of each token (as tree-sitter
does); then when processing queries, you have to add up all the lengths
of the preceding tokens in order to report the buffer position of the
information you are computing. That is also linear in the length of the
You can probably get better than linear performance in "the usual case"
by storing the total length of the subtree at each node of the AST.
It's still theoretically linear in the worst case, of course.

Thought I would note that there's a substantial literature now on incremental parsing, especially the sort that is needed for editor tools. One doesn't need to reinvent the algorithms, they're out there waiting to be used. The Tree Sitter project is based on previous published work.

There is indeed a big literature! I wish there were a bigger literature on *composable* incremental parsers though. IMHO, what we need is an incremental GLR system (yes, GLR is bad worst-case, but it's not a practical concern) that spits out a parse *forest* which we then pare down to a parse tree with ad-hoc syntactic consistency rules. Something like this naturally supports multi-language modes and incorporation of out-of-band semantic information.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]