emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Tree-sitter api


From: Yuan Fu
Subject: Re: [SPAM UNSURE] Tree-sitter api
Date: Thu, 26 Aug 2021 22:18:08 -0700

Thank you very much for spending time on this :-)

> On Aug 24, 2021, at 7:59 AM, Stephen Leake <stephen_leake@stephe-leake.org> 
> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> 
>>> ada-mode takes the approach of embedding the indent rules directly in
>>> the grammar, and the functions that do that provide a few more options
>>> than yours. To see the definition of those functions, you'll have to
>>> install the wisi package, and look in wisi.info, section Grammar
>>> actions. (it would be nice if that info/html file was linked from the
>>> GNU ELPA package page; I'll start a new thread for that).
>> 
>> I had a cursory look at the manual for indent in wisi and have some
>> questions. Why does wisi indent from “low-level productions”? 
> 
> The indent of every new-line must be specified; low level productions
> can contain new-lines.

Ah, I see, what I did is to find the “largest” node that starts at BOL, and try 
to match that. IIUC, wisi starts from the “smallest” entity, and goes up (by 
getting its parent repeatedly) until there is a non-nil indent rule for it?

[snip]

> So your syntax for indent is much more verbose than the wisi syntax
> (because each token gets a separate rule), but specifies the same
> information.
> 
> Your syntax also requires naming each token that is referenced in an
> indent rule; wisitoken can use token position to do that, which is the
> main reason indent is specified directly in the grammar file; it's very
> easy to associate each indent expression with the corresponding token,
> without having to make up names for the tokens.

> Here are the above
> wisitoken productions without the token names:
> 
>  function_definition : [ms_call_modifier] declaration_specifiers
>    declarator compound_statement
>    {(wisi-indent-action [nil nil nil 0])}
> 
>  call_expression : expression argument_list
>    {(wisi-indent-action [nil 2])}
> 
> To be fair, we'd have to look at the other types of rules, to see if
> this pattern holds up.

I tried and all rules can be translated into wisi’s style. However, it ends up 
as verbose as the previous one. My idea is to write out match patterns (similar 
to that in wisi) and give names to the interesting ones (so we use names as 
opposed to position). Then, if any matched node happens to be the node at 
point, use that node’s corresponding indent rule to indent. And in the indent 
rule, we can refer to other matched nodes. For example, in the indent rule of 
list_rest, the anchor is list_first.

Maybe there are better ways to implement this, but at its current stage I don’t 
think this is better than tree-sitter-simple-indent.

I think part of the reason why wisi’s indent rule can be succinct is that it is 
written along the grammar definition. It is hard to make tree-sitter’s indent 
rule as succinct while being easy to understand.

(defvar tree-sitter-query-indent-rules
  '((tree-sitter-c
     "(function_definition body: (_) @body)

(field_declaration_list) @field_decl

(call_expression (_) @call_child)

(if_statement
 (condition) @if_cond
 (consequence) @if_cons
 (alternative) @if_alt
 \"else\" @else)

(switch_statement
 (condition) @switch_cond)

(case_statement
 (_) @case-child) @case

(compound_statement) @lbracket
\"}\" @rbracket

(compound_statement
 . (_) @list_first
 (_)* @list_rest)

(initializer_list
 . (_) @list_first
 (_)* @list_rest)

(argument_list
 . (_) @list_first
 (_)* @list_rest)

(parameter_list
 . (_) @list_first
 (_)* @list_rest)

(field_declaration_list
 . (_) @list_first
 (_)* @list_rest)
"
     (body parent 0)
     (field_decl parent 0)
     (call_child parent 2)
     (if_cond parent 2)
     (if_cons parent 2)
     (if_alt parent 2)
     (switch_cond parent 2)
     (else parent 0)
     (case parent 0)
     (case-child parent 2)
     (lbracket parent 2)
     (rbracket parent 0)
     (list_first parent 2)
     (list_rest list_first 0)))
  "A list of indent rule settings.
Each indent rule setting should be

    (LANGUAGE PATTERN INDENT INDENT...)

where LANGUAGE is a language symbol, PATTERN is a query pattern
string, and each INDENT is a list

    (CAPTURE_NAME ANCHOR OFFSET)

  If a captured node matches
with the node at point, Emacs looks for an INDENT that has a
matching CAPTURE_NAME, and use the ANCHOR and OFFSET of that
INDENT to indent the current line.

ANCHOR should be a capture name, this capture name should capture
another node in PATTERN.  Emacs finds the column of that node,
adds OFFSET to it, and indent the current line to that column.

TODO: examples in manual")

> 
> I think you were biased by the "matching" rules tree-sitter supports.
> That approach is reasonable when you only want to specify information
> for a few nodes in the tree. Wisi assumes you want to specify indent
> information for most of the nodes in the tree, so it supports a
> tree-traversal model instead.

I assumed that the indent rule for most nodes would be something basic, like 
“same as previous line”, and we only need to specify indent rules for some 
“special” nodes. 

IIUC, this tree-traversal method that you mentioned is like going bottom-up, 
and (in tree-sitter terms) match on each level, and accumulate indent delta for 
each matched indent rule, is that right? Does wisi go all the way up to 
top-level?

> Tree-sitter does support tree traversal,
> but doesn't provide an easy way to add information for each node, as the
> wisi indent-action syntax does.

Yes, I would still need to use a match pattern and name each node that I want 
to specify an indent delta for. There is no way to specify indent by position 
in the match pattern without naming each node.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]