emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: treesitter local parser: huge slowdown and memory usage in a long fi


From: Yuan Fu
Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file
Date: Sun, 18 Feb 2024 21:53:45 -0800


> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov <dmitry@gutov.dev> wrote:
> 
> On 13/02/2024 10:08, Yuan Fu wrote:
> 
>>> On 12/02/2024 06:16, Yuan Fu wrote:
>>>> Thanks, the culprit is the call to treesit-update-ranges in
>>>> treesit--pre-redisplay, where we don’t pass it any specific range, so it
>>>>  updates the range for the whole buffer. Eli, is there any way to get a
>>>> rough estimate the range that redisplay is refreshing? Do you think
>>>> something like this would work?
>>> 
>>> If we don't update the ranges outside of some interval surrounding the 
>>> window, what does that mean for correctness?
>> If the place of update and the embedded code currently in view belong to the 
>> same node in the host language, then when we update ranges for the current 
>> window-visible range, the whole node’s range is updated. So at least for 
>> this node, the range is correct.
>> If the place of update and the embedded code currently in view belong to 
>> different nodes in the host language, then when we update ranges for the 
>> current window-visible range, only the visible node’s range is updated.
> 
> Okay. What about positions after the visible part of the buffer? Can their 
> ranges be outdated? It's probably okay when the ranges are only used for 
> font-lock and syntax-ppss, but I wonder about possible other applications 
> (reindenting the whole buffer, for example).

It’s the same as positions before the visible part. For reindenting the whole 
buffer, treesit-indent-region will update the range for the whole buffer at the 
very beginning.

> 
>>> 
>>> Perhaps the mode has a syntax-propertize-function which behaves differently 
>>> (as it should) depending on the language at point. Or different ranges have 
>>> different syntax tables, something like that.
>>> 
>>> If the ranges, after some edit (perhaps a programmatic one, performed far 
>>> from the visible area), are kept not update somewhere around the beginning 
>>> of the buffer, do we not risk confusing the syntax-ppss parser, for example?
>> That can happen, yes.
>>> 
>>> Come to think of it, take treesit-indent: it only updates the ranges for 
>>> the current line. But the line's indentation usually depends on the 
>>> previous buffer positions, doesn't it?
>> The range passed to treesit-update-ranges act as an intercepting range—we 
>> capture nodes that intercepts with the range and use them to update ranges. 
>> If the line to be indented is in an embedded language block, the whole block 
>> will be captured and it’s range will be given to the embedded language 
>> parser.
>> We haven’t have any problem so far mainly because most embedded code blocks 
>> are local,  and it’s rare for some edit to take place far from the visible 
>> portion which affects ranges and user expects that edit to affect the 
>> current visible range.
>> I don’t have any great idea for a better way to update ranges right now. Let 
>> me think about that. In the meantime, I’ll push a temporary fix so V’s 
>> original problem can be solved.
> 
> I was thinking (since considering the same problem in mmm-mode, actually) 
> that it would make sense to either plug into syntax-propertize-function, or 
> have a parallel data structure similarly tracking the outdated buffer 
> regions, which would only update the part of the buffer which had been 
> modified since last time.
> 
> Dealing with the "remainder" of the buffer might be trickier, but maybe some 
> heuristic which would help detect the "no changes" case could be implemented.

Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be avoided, 
since the current on-demand range update has been working fine, until we added 
treesit--pre-redisplay for syntax-ppss.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]