bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node


From: Yuan Fu
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 01:05:49 -0800


> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> 
> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> 
>>> wrote:
>>> 
>>> 
>>> Yuan Fu <casouri@gmail.com> writes:
>>> 
>>>>> GC has historically never called xmalloc, so the profiler will
>>>>> likely
>>>>> crash upon growing the mark stack as well.  I guess another
>>>>> important
>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>> 
>>>> 
>>>>> As you see, when we call ts_tree_delete, it calls
>>>>> ts_subtree_release,
>>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>>> this is OK?
>>>> 
>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>> pointer to the complete data. This way multiple trees share common
>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>> tree and get a new tree, these two trees will share the unchanged part
>>>> of the tree.
>>> 
>>> Would that mean we could possibly preserve node instances -- either
>>> the real TS ones, or an Emacs-created facsimile -- between
>>> incremental parsing? That would be useful for refactoring.
>> 
>> What kind of exact interface (function) do you want? The
>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>> delete the old tree and give it to you, or allow you to access
>> information of an outdated node.
> 
> OK, so let me explain:
> 
> Touching the buffer for any reason invalidates the whole tree; that's
> not good. It's not good, because a lot of the information may still be
> useful and viable. Outdating the node is not a bad idea as it avoids a
> lot of 'traps' around accidental modifications that can corrupt things
> without the developer's knowledge.
> 
> I'd like to be able to access all the information possible; perhaps
> behind a flag variable like `treesit-allow-outdated-node-access'. What
> I'm really mostly interested in is:
> 
> - How well the node references handle changes in byte positions in TS.

They don’t handle position changes. If the buffer content changed, we need to 
reparse. Once we reparsed the buffer, a new tree is born. While it is true that 
the new tree shares some node with the old tree, tree-sitter does not expose 
any function or information that tells you which node in the new tree is “the 
same” as which node in the old tree; nor does it tell you whether a node in the 
old tree still “exists” in the new tree.

Now, there does exist a function (in tree-sitter’s API) that allows you to 
“edit” a node with position changes. But a) I’m not sure how does it handle the 
case where the node is deleted by the change and b) it is not very useful 
because once you reparse the buffer, the new tree is completely independent 
from the old tree (ignoring the implementation detail which is not exposed).

> 
> - Does changing something at X shift (like a `point-marker`) everything
> below it? Does an outdated node correctly reference its new location
> and state, such as changes to children or its position in the tree?

Like I said above, any buffer change will create a new tree with no relation to 
the old tree, so there is no shifting.

And there really isn’t a “new location”: we don’t know if the old node is still 
in the new tree. Mind you, even if the node is completely outside of the 
changed region, it can still disappear from the new tree because of change of 
its surrounding context. For example, in the following C code:

/*
int c = 1;

If I insert a closing comment delimiter, and buffer becomes

/*
int c = 1;
*/

Even though int c = 1; is not in the changed range, nor did it’s position move, 
all those nodes (int, c, =, etc) are not in the new tree anymore, because the 
whole thing becomes a comment.

I made any access to outdated nodes error because there really isn’t any good 
reason to use them, at least I didn’t think of any at the time. And make them 
error out should help people catch errors.

> 
> Right now, Combobulate can make a proxy node, which essentially
> captures the basics of a live node and stores it in a defstruct. That
> way I can at least retain the start/end, type, text, etc. of a node
> and still do light refactoring without contorting myself to do things
> in a particular order, which is not always possible (like delaying
> editing to the very end.)

IIUC, you want to do some very minor whitespace edit to the buffer which 
doesn’t really change the parse tree, so you don’t want the nodes to be 
invalidated for no good reason? Not erroring on outdated nodes is easy. As you 
said, we can add a treesit-inhibit-error-outdated variable. But not it’s not so 
easy to automatically update outdated nodes’ positions (with aforementioned 
tree-sitter function). However, if you are making those changes, you much know 
how to adjust your nodes position, right? So maybe it isn’t a must-have for 
your purpose.


Yuan






reply via email to

[Prev in Thread] Current Thread [Next in Thread]