[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tree-sitter api
From: |
Yuan Fu |
Subject: |
Re: Tree-sitter api |
Date: |
Wed, 15 Sep 2021 08:56:18 -0700 |
> On Sep 14, 2021, at 11:15 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 14 Sep 2021 17:50:48 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>>
>>> Almost: there's the (minor) problem of obtaining the "<lang>" part by
>>> the major-mode. I think it would be good to have a utility function
>>> to do that so that major modes won't need to reinvent the wheel, do
>>> the research, etc.
>>
>> That’s where I don’t understand: the major mode is written by major mode
>> writers, who certainly know the correct “<lang>” name: they need to read the
>> source of the language definition to use language’s tree-sitter features.
>> You seem to agree on that because you said that this function can be
>> extended by major mode writers.
>
> I don't understand what you are saying here. Why would major mode
> programmers need to know the correct <lang> name? The TS facilities
> we will have in Emacs will be language-agnostic, right? For example,
> to correctly indent a line of code, the major mode will call some
> hypothetical tree-sitter-get-indentation function, and that function
> will work in any major mode, provided that the major mode told TS to
> load the support for the programming language of the buffer. Right?
Now I see why there is confusion. Tree-sitter only provide a “primitive”
feature: the concert syntax tree, and it is not language-agnostic. You don’t
get indentation for free, unfortunately. Indenting the program by the
information from the syntax tree is our problem. Tree-sitter doesn’t have
anything like tree-sitter-get-indentation function, and there is no mechanical
way to provide one, a human needs to read the source of the tree-sitter
language definition and figure out how to do it. See below.
> So when the major mode initializes for working with TS, it should tell
> TS which language to load, and why would we request the major mode
> programmer to know the correct <lang> name which corresponds to the
> major mode's programming language? Why would they need to "read the
> source of the language definition to use language’s tree-sitter
> features"? The specifics of the TS implementation of, say,
> indentation calculations won't be exposed on the level of the
> indentation facilities provided by TS integration in Emacs, right?
Tree-sitter has no indentation calculation feature. Major mode writers
genuinely need to read the source of the tree-sitter language definition. The
source tells us what will be in the syntax tree parsed by tree-sitter, and the
node names differ from one language to another. For example, if I want to
fontify type identifiers in C with font-lock-type-face, I need to know how is
type represented in the syntax tree. I look up the source[1], and find
_type_specifier: $ => choice(
$.struct_specifier,
$.union_specifier,
$.enum_specifier,
$.macro_type_specifier,
$.sized_type_specifier,
$.primitive_type,
$._type_identifier
),
This roughly translates to
_type_specifier := <struct_specifier>
| <union_specifier>
| <enum_specifier>
| <macro_type_specifier>
| <sized_type_specifier>
| <primitive_type>
| <_type_identifier>
in BNF
From this (and some other hint) I know I need to grab all the _type_specifier
nodes in the syntax tree, find their corresponding text in the buffer, and
apply font-lock-type-face. And type identifiers in another language will be
named differently, tree-sitter doesn’t provide an abstraction for semantic
names in the syntax tree.
>
> There's some misunderstanding here, and I cannot for the life of me
> figure out where is it.
I was very confused, too, for the past several days, but I think we know the
source of it now.
[1] The source of tree-sitter-c is at
https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js
Yuan
- Re: Tree-sitter api, (continued)
- Re: Tree-sitter api, Yuan Fu, 2021/09/13
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/13
- Re: Tree-sitter api, Yuan Fu, 2021/09/13
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/13
- Re: Tree-sitter api, Yuan Fu, 2021/09/13
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/13
- Re: Tree-sitter api, Yuan Fu, 2021/09/14
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/14
- Re: Tree-sitter api, Yuan Fu, 2021/09/14
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/15
- Re: Tree-sitter api,
Yuan Fu <=
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/15
- Re: Tree-sitter api, Stefan Monnier, 2021/09/15
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/15
- Re: Tree-sitter api, Yuan Fu, 2021/09/16
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/17
- Re: Tree-sitter api, Yuan Fu, 2021/09/17
- Re: Tree-sitter api, Eli Zaretskii, 2021/09/17
- Re: Tree-sitter api, Yuan Fu, 2021/09/17
- Re: Tree-sitter api, Tuấn-Anh Nguyễn, 2021/09/17
- Re: Tree-sitter api, Yuan Fu, 2021/09/18