emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter api


From: Yuan Fu
Subject: Re: Tree-sitter api
Date: Wed, 15 Sep 2021 08:56:18 -0700

> On Sep 14, 2021, at 11:15 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 14 Sep 2021 17:50:48 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>> 
>>> Almost: there's the (minor) problem of obtaining the "<lang>" part by
>>> the major-mode.  I think it would be good to have a utility function
>>> to do that so that major modes won't need to reinvent the wheel, do
>>> the research, etc.
>> 
>> That’s where I don’t understand: the major mode is written by major mode 
>> writers, who certainly know the correct “<lang>” name: they need to read the 
>> source of the language definition to use language’s tree-sitter features. 
>> You seem to agree on that because you said that this function can be 
>> extended by major mode writers.
> 
> I don't understand what you are saying here.  Why would major mode
> programmers need to know the correct <lang> name?  The TS facilities
> we will have in Emacs will be language-agnostic, right?  For example,
> to correctly indent a line of code, the major mode will call some
> hypothetical tree-sitter-get-indentation function, and that function
> will work in any major mode, provided that the major mode told TS to
> load the support for the programming language of the buffer.  Right?

Now I see why there is confusion. Tree-sitter only provide a “primitive” 
feature: the concert syntax tree, and it is not language-agnostic. You don’t 
get indentation for free, unfortunately. Indenting the program by the 
information from the syntax tree is our problem. Tree-sitter doesn’t have 
anything like tree-sitter-get-indentation function, and there is no mechanical 
way to provide one, a human needs to read the source of the tree-sitter 
language definition and figure out how to do it. See below.

> So when the major mode initializes for working with TS, it should tell
> TS which language to load, and why would we request the major mode
> programmer to know the correct <lang> name which corresponds to the
> major mode's programming language?  Why would they need to "read the
> source of the language definition to use language’s tree-sitter
> features"?  The specifics of the TS implementation of, say,
> indentation calculations won't be exposed on the level of the
> indentation facilities provided by TS integration in Emacs, right?

Tree-sitter has no indentation calculation feature. Major mode writers 
genuinely need to read the source of the tree-sitter language definition. The 
source tells us what will be in the syntax tree parsed by tree-sitter, and the 
node names differ from one language to another. For example, if I want to 
fontify type identifiers in C with font-lock-type-face, I need to know how is 
type represented in the syntax tree. I look up the source[1], and find

    _type_specifier: $ => choice(
      $.struct_specifier,
      $.union_specifier,
      $.enum_specifier,
      $.macro_type_specifier,
      $.sized_type_specifier,
      $.primitive_type,
      $._type_identifier
    ),

This roughly translates to 

_type_specifier := <struct_specifier>
                 | <union_specifier>
                 | <enum_specifier>
                 | <macro_type_specifier>
                 | <sized_type_specifier>
                 | <primitive_type>
                 | <_type_identifier>

in BNF

From this (and some other hint) I know I need to grab all the _type_specifier 
nodes in the syntax tree, find their corresponding text in the buffer, and 
apply font-lock-type-face. And type identifiers in another language will be 
named differently, tree-sitter doesn’t provide an abstraction for semantic 
names in the syntax tree.

> 
> There's some misunderstanding here, and I cannot for the life of me
> figure out where is it.

I was very confused, too, for the past several days, but I think we know the 
source of it now.

[1] The source of tree-sitter-c is at 
https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]