bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node


From: Mickey Petersen
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 14:29:52 +0000
User-agent: mu4e @VERSION@; emacs 30.0.50

Yuan Fu <casouri@gmail.com> writes:

>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> 
>> wrote:
>>
>>
>> Yuan Fu <casouri@gmail.com> writes:
>>
>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> 
>>>> wrote:
>>>>
>>>>
>>>> Yuan Fu <casouri@gmail.com> writes:
>>>>
>>>>>> GC has historically never called xmalloc, so the profiler will
>>>>>> likely
>>>>>> crash upon growing the mark stack as well.  I guess another
>>>>>> important
>>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>>>
>>>>>
>>>>>> As you see, when we call ts_tree_delete, it calls
>>>>>> ts_subtree_release,
>>>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>>>> this is OK?
>>>>>
>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>>> pointer to the complete data. This way multiple trees share common
>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>>> tree and get a new tree, these two trees will share the unchanged part
>>>>> of the tree.
>>>>
>>>> Would that mean we could possibly preserve node instances -- either
>>>> the real TS ones, or an Emacs-created facsimile -- between
>>>> incremental parsing? That would be useful for refactoring.
>>>
>>> What kind of exact interface (function) do you want? The
>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>>> delete the old tree and give it to you, or allow you to access
>>> information of an outdated node.
>>
>> OK, so let me explain:
>>
>> Touching the buffer for any reason invalidates the whole tree; that's
>> not good. It's not good, because a lot of the information may still be
>> useful and viable. Outdating the node is not a bad idea as it avoids a
>> lot of 'traps' around accidental modifications that can corrupt things
>> without the developer's knowledge.
>>
>> I'd like to be able to access all the information possible; perhaps
>> behind a flag variable like `treesit-allow-outdated-node-access'. What
>> I'm really mostly interested in is:
>>
>> - How well the node references handle changes in byte positions in TS.
>
> They don’t handle position changes. If the buffer content changed, we
> need to reparse. Once we reparsed the buffer, a new tree is
> born. While it is true that the new tree shares some node with the old
> tree, tree-sitter does not expose any function or information that
> tells you which node in the new tree is “the same” as which node in
> the old tree; nor does it tell you whether a node in the old tree
> still “exists” in the new tree.
>
> Now, there does exist a function (in tree-sitter’s API) that allows
> you to “edit” a node with position changes. But a) I’m not sure how
> does it handle the case where the node is deleted by the change and b)
> it is not very useful because once you reparse the buffer, the new
> tree is completely independent from the old tree (ignoring the
> implementation detail which is not exposed).
>
>>
>> - Does changing something at X shift (like a `point-marker`) everything
>> below it? Does an outdated node correctly reference its new location
>> and state, such as changes to children or its position in the tree?
>
> Like I said above, any buffer change will create a new tree with no relation 
> to the old tree, so there is no shifting.
>
> And there really isn’t a “new location”: we don’t know if the old node
> is still in the new tree. Mind you, even if the node is completely
> outside of the changed region, it can still disappear from the new
> tree because of change of its surrounding context. For example, in the
> following C code:
>
> /*
> int c = 1;
>
> If I insert a closing comment delimiter, and buffer becomes
>
> /*
> int c = 1;
> */
>
> Even though int c = 1; is not in the changed range, nor did it’s
> position move, all those nodes (int, c, =, etc) are not in the new
> tree anymore, because the whole thing becomes a comment.
>
> I made any access to outdated nodes error because there really isn’t
> any good reason to use them, at least I didn’t think of any at the
> time. And make them error out should help people catch errors.
>
>>
>> Right now, Combobulate can make a proxy node, which essentially
>> captures the basics of a live node and stores it in a defstruct. That
>> way I can at least retain the start/end, type, text, etc. of a node
>> and still do light refactoring without contorting myself to do things
>> in a particular order, which is not always possible (like delaying
>> editing to the very end.)
>
> IIUC, you want to do some very minor whitespace edit to the buffer
> which doesn’t really change the parse tree, so you don’t want the
> nodes to be invalidated for no good reason? Not erroring on outdated
> nodes is easy. As you said, we can add a
> treesit-inhibit-error-outdated variable. But not it’s not so easy to
> automatically update outdated nodes’ positions (with aforementioned
> tree-sitter function). However, if you are making those changes, you
> much know how to adjust your nodes position, right? So maybe it isn’t
> a must-have for your purpose.

It's a good point, but it's also easy to create a scenario where you
at least want to keep the position and esp. the type and text (for
reporting information to the user, or similar.)

My main interest is now refactoring and how to best do it. If TS can
do some of it, then all the better. I realise it was never meant to,
but if we can continue accessing the information contained in a node
even if it is outdated, then that could be useful, however niche.

Currently I use overlays and point markers, but they are not
infallible.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]