[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61529: 30.0.50; tree-sitter: weird off-by-one error but only in css-

From: Mickey Petersen
Subject: bug#61529: 30.0.50; tree-sitter: weird off-by-one error but only in css-ts-mode(?) with `treesit-node-at'
Date: Thu, 16 Feb 2023 19:48:06 +0000
User-agent: mu4e @VERSION@; emacs 30.0.50

Theodor Thornhill <theo@thornhill.no> writes:

> Mickey Petersen <mickey@masteringemacs.org> writes:
>> Theodor Thornhill <theo@thornhill.no> writes:
>>> Mickey Petersen <mickey@masteringemacs.org> writes:
>>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>>>> From: Mickey Petersen <mickey@masteringemacs.org>
>>>>>> Date: Wed, 15 Feb 2023 08:25:53 +0000
>>>>>> With point at '2', then I'd expect `treesit-node-at' to yield that node. 
>>>>>> But it does not:
>>>>>> (cons (point) (treesit-node-at (point)))
>>>>>> => (34 . #<treesit-node "(" in 34-35>)
>>>>> The value of point is the number of the character which _follows_
>>>>> point, yes?  So when the cursor is on '2', point is actually between
>>>>> '(' and '2'.  Right?  What does this mean in terms of the node that
>>>>> should be returned by tree-sitter?
>>>> Correct, point is between '(' and '2'. So 34-35 means it occupies
>>>> position 34-35 or [34,35). So point is outside the scope of the '('
>>>> single-char anonymous node.
>>>> Or at least it should be: the problem is that it *is* inside it in
>>>> this one weird instance and, near as I can find, only in this mode,
>>>> and then only in this place, it isn't. I suspect `treesit-node-at' has
>>>> a bug.
>>> Hi, Mickey!
>> Hey Theo!
>>>> Consider:
>>>>     a {
>>>>       background: linear-gradient(210deg, rgba(|255,82,41,1) 0%, 
>>>> rgba(251,165,85,1) 54%, rgba(163,73,73,1) 100%);
>>>>     }
>>>> Note the new position of point in rgba. `treesit-node-at` with `(point)` 
>>>> now correctly returns
>>>>     #<treesit-node integer_value in 48-51>
>>>> Move point back one position:
>>>>     a {
>>>>       background: linear-gradient(210deg, rgba|(255,82,41,1) 0%, 
>>>> rgba(251,165,85,1) 54%, rgba(163,73,73,1) 100%);
>>>>     }
>>>> And now:
>>>>   (treesit-node-at (point)) => #<treesit-node "(" in 47-48>
>>>> In start contrast to the original example.
>>> So the docstring of treesit-node-at states:
>>>   "Return the leaf node at position POS.
>>> A leaf node is a node that doesn't have any child nodes.
>>> The returned node's span covers POS: the node's beginning is before
>>> or at POS, and the node's end is at or after POS.
>>> If no leaf node's span covers POS (e.g., POS is on whitespace
>>> between two leaf nodes), return the first leaf node after POS.
>>> If there is no leaf node after POS, return the first leaf node
>>> before POS.
>>> Return nil if no leaf node can be returned.  If NAMED is non-nil,
>>> only look for named nodes."
>>> Doesn't this describe this behavior?
>> It's a good question: I suppose it's a question of wording (or
>> understanding) more than it necessarily being *wrong* -- it is, after
>> all, a custom function.
>> I read and interpreted it to mean that due to how node boundaries work
>> that "*end is at* or after POS" to mean that point is wholly contained
>> in the node "(" which, due to how tree-sitter determines node extents,
>> it technically isn't.
>> But I think it's fair enough if this is intentional -- I've no real
>> suggestions for improving its behaviour if this is intended. So if
>> it's working as expected, then it's safe to close the issue.
> There is one thing here which confuses me a lot and that you might also
> have some thoughts on. Consider some simple tsx:
> ```
> const x = () => (
>   <div>
>     try to C-SPC C-SPC at the beginning of try after activating 
> treesit-explore-mode
>   </div>
> )
> ```
> Now you can maybe see that the jsx_text node covers a lot more than just
> the line in the middle.  There are some other cases like this in some
> languages, and they do trip up our semantics. May this be one similar
> such case, just not concerning indentation in this case?

Yeah that's just how XML is, which is more or less what JSX is based
on. So that does not seem too surprising to me in this particular case
that the jsx_text node has extra whitespace around the text, even if
it is insignificant in SGML. (Though some xml parsers do trim the
leading and trailing element whitespace...)

> IOW, sometimes the parser also returns nodes including whitespace, so it
> looks like we are outside a node, but we're not yet.
> Theo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]