emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Re: Tree-sitter api


From: Stephen Leake
Subject: Re: [SPAM UNSURE] Re: Tree-sitter api
Date: Tue, 24 Aug 2021 07:59:09 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt)

Yuan Fu <casouri@gmail.com> writes:

>> 
>> ada-mode takes the approach of embedding the indent rules directly in
>> the grammar, and the functions that do that provide a few more options
>> than yours. To see the definition of those functions, you'll have to
>> install the wisi package, and look in wisi.info, section Grammar
>> actions. (it would be nice if that info/html file was linked from the
>> GNU ELPA package page; I'll start a new thread for that).
>
> I had a cursory look at the manual for indent in wisi and have some
> questions. Why does wisi indent from “low-level productions”? 

The indent of every new-line must be specified; low level productions
can contain new-lines.

> (I think most indentation engine works line-by-line from the first
> line.) I don’t know much about how wisi works, but the indentation
> system seems to stem from circumstances quite different from that of
> tree-sitter. For example, wiki’s indent is devised alongside the
> grammar definition, while for tree-sitter, all the hard work of
> defining grammar is done for me and I’m merely a user of the grammar:
> that makes indenting with tree-sitter a much simpler job.

The Ada grammar is taken from the Ada Reference Manual; the indent
information is added after. The indent information could be in a
separate file, as in tree-sitter (wisitoken does not currently support
this; there would need to be a way to specify which production the
indent rule is associated with).

A tree-sitter based indent engine still has to specify the indent of
every new-line; it's the same amount of information.

Taking the examples from your email:

>     ((match nil "function_definition" "body") parent 0)

> means “match the node which it’s parent’s type is
> “function_definition” and its field name is “body”, indent to the
> start of its parent. That indents the starting braces in

> int main ()
> {
> }

Refering to the tree-sitter-c grammar at
https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js,
there is a C grammar production (in tree-sitter syntax):

  function_definition: $ => seq(
      optional($.ms_call_modifier),
      $._declaration_specifiers,
      field('declarator', $._declarator),
      field('body', $.compound_statement)
    ),

In wisitoken syntax, this is:

  function_definition : [ms_call_modifier] declaration_specifiers
    declarator=declarator body=compound_statement

(the current wisi user guide does not define the "=" syntax for
declaring token names, but it is supported; I'll add it to the user
guide)

The indent rule specifies the indent of the field named 'body',
relative to the start of the production. So in wisitoken, this would
specify one component of the indent action for this production:

    {(wisi-indent-action [nil nil nil (body . 0)])}

Presumably there are other rules that specify the indent of the other
tokens in that production, so they would not be 'nil', which in
wisitoken means "undefined"; it is an error for any new-line to have an
undefined indent after all indent actions are applied.

Next example:

    ((parent-is "call_expression") parent 2)

The production is:

 call_expression: $ => prec(PREC.CALL, seq(
      field('function', $._expression),
      field('arguments', $.argument_list)
    )),

In wisitoken syntax (note that wisitoken does not support precedence
declarations (yet)):

 call_expression : function=expression arguments=argument_list
  {(wisi-indent-action [nil (arguments . 2)])}

So your syntax for indent is much more verbose than the wisi syntax
(because each token gets a separate rule), but specifies the same
information.

Your syntax also requires naming each token that is referenced in an
indent rule; wisitoken can use token position to do that, which is the
main reason indent is specified directly in the grammar file; it's very
easy to associate each indent expression with the corresponding token,
without having to make up names for the tokens. Here are the above
wisitoken productions without the token names:

  function_definition : [ms_call_modifier] declaration_specifiers
    declarator compound_statement
    {(wisi-indent-action [nil nil nil 0])}

  call_expression : expression argument_list
    {(wisi-indent-action [nil 2])}

To be fair, we'd have to look at the other types of rules, to see if
this pattern holds up.

I think you were biased by the "matching" rules tree-sitter supports.
That approach is reasonable when you only want to specify information
for a few nodes in the tree. Wisi assumes you want to specify indent
information for most of the nodes in the tree, so it supports a
tree-traversal model instead. Tree-sitter does support tree traversal,
but doesn't provide an easy way to add information for each node, as the
wisi indent-action syntax does.

-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]