emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about tree-sitter


From: Yuan Fu
Subject: Re: Questions about tree-sitter
Date: Thu, 7 Sep 2023 16:42:44 -0700


> On Sep 6, 2023, at 9:11 AM, Lynn Winebarger <owinebar@gmail.com> wrote:
> 
> On Wed, Aug 30, 2023 at 3:03 AM Yuan Fu <casouri@gmail.com> wrote:
>>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> 
>>> wrote:
>>> I have a few questions about tree-sitter.
>>> 
>>> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
>>> major mode, it's a work in progress.  The grammar is here:
>>> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
>>> far able to parse simple files, and the major mode prototype is
>>> attached to this message.
>>> 
>>> So, the questions:
>>> 
>>> 1. Is there a way to reload a grammar?
>>> 
>>> Emacs is pretty nice as a playground for testing grammars, but once a
>>> grammar is loaded, it won't be loaded again until Emacs restarts (as far
>>> as I know).
>>> Is it possible to reload a grammar after modifying it?
>> 
>> No, and it’s probably not easy to implement either, since unloading the 
>> grammar would require Emacs to purge/invalid all the node/query/parsers 
>> using that grammar.
> 
> Reviewing some generated "parser.c" files, and some of the available
> documentation, it appears the parser.c file basically creates a lexing
> function that adheres to a certain protocol in terms of
> producing/consuming a standard lexer state data structure, and an
> LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous
> actions).  These and definitions of the tokens and grammar symbols are
> bundled up in a language structure passed to the tree-sitter library.
> LALR(1) tables are essentially simplified/compressed LR(1) tables, and
> emacs has code to calculate such tables directly in elisp.
> Therefore, given functionality to translate elisp data into the raw C
> structures, we should be able to dynamically create language data
> structures to pass to the tree-sitter library to create a library.
> We would also need a table driven lexer framework in place of the
> generated lexer in the C file to completely avoid going through a C
> compiler.
> The other novel features of tree-sitter parsers appear to be
> implemented in the parser runtime, not in the table calculation.
> 
> I've implemented LALR(1) parser generators two or three times in the
> last couple of decades, this might be a fun project for me while I am
> unambiguously able to contribute to GNU Emacs.

That’ll be great. But note that the parser structure has scape hatches: certain 
things can be implemented by arbitrary C function. Also tree-sitter allows 
grammars to use custom scanners [1]. 

[1] https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]