Re: Tree-sitter api

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter api

From:	Yuan Fu
Subject:	Re: Tree-sitter api
Date:	Fri, 17 Sep 2021 13:30:05 -0700

> 
> Why do we need to document the language definitions?  When a Lisp
> programmer defines font-lock and indentation for a programming
> language in the current Emacs, do they necessarily need to consult the
> language grammar?

> […]

> This stuff should be known to TS; the Lisp programmer only needs to be
> aware of the results of lexical and syntactical analysis, in terms of
> their Lisp expressions (Lisp data structures with appropriate symbols
> and fields).

I demonstrate the reason why one needs to consult the source a few messages 
back:

> Tree-sitter has no indentation calculation feature. Major mode writers 
> genuinely need to read the source of the tree-sitter language definition. The 
> source tells us what will be in the syntax tree parsed by tree-sitter, and 
> the node names differ from one language to another. For example, if I want to 
> fontify type identifiers in C with font-lock-type-face, I need to know how is 
> type represented in the syntax tree. I look up the source[1], and find
> 
>    _type_specifier: $ => choice(
>      $.struct_specifier,
>      $.union_specifier,
>      $.enum_specifier,
>      $.macro_type_specifier,
>      $.sized_type_specifier,
>      $.primitive_type,
>      $._type_identifier
>    ),
> 
> This roughly translates to 
> 
> _type_specifier := <struct_specifier>
>                 | <union_specifier>
>                 | <enum_specifier>
>                 | <macro_type_specifier>
>                 | <sized_type_specifier>
>                 | <primitive_type>
>                 | <_type_identifier>
> 
> in BNF
> 
> From this (and some other hint) I know I need to grab all the _type_specifier 
> nodes in the syntax tree, find their corresponding text in the buffer, and 
> apply font-lock-type-face. And type identifiers in another language will be 
> named differently, tree-sitter doesn’t provide an abstraction for semantic 
> names in the syntax tree.


>> And I want to also point out that as Emacs core developers, we can’t 
>> possibly provide a good translation from convention language names to their 
>> tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) 
>> that won’t cover all available languages, and 2) if there is a new language, 
>> we need to wait for the next release to update our translation. It is better 
>> for the major mode writers to provide the information on how to translate 
>> names.
> 
> The database used by the conversion should definitely be extensible.
> But that doesn't mean it should be empty.
> 
> Anyway, we've spent enough time on this issue.  If you are still
> unconvinced, feel free to do it your way, and let the chips fall as
> they may.

I’ll do it the way I see fit. You can always comment in the final review (or 
something). Thanks.

Yuan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Tree-sitter api, (continued)

Prev by Date: Re: Patch to remove a bit of duplicated code in eval.c
Next by Date: Re: master ff4de1b: Fix quoting style in Lisp comments
Previous by thread: Re: Tree-sitter api
Next by thread: Re: Tree-sitter api
Index(es):
- Date
- Thread