emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E


From: Stephen Leake
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Fri, 03 Apr 2020 09:45:44 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt)

Eli Zaretskii <address@hidden> writes:

>> From: Stephen Leake <address@hidden>
>> Date: Thu, 02 Apr 2020 18:27:59 -0800
>> 
>> > Such copying is not really scalable, and IMO should be avoided.
>> > During active editing, redisplay runs very frequently, and having to
>> > copy portions of the buffer, let alone all of it, each time, which
>> > necessarily requires memory allocation, consing of Lisp objects, etc.,
>> > will produce significant memory pressure, expensive heap
>> > allocations/deallocations, and a lot of GC.  Recall that on many
>> > modern platforms Emacs doesn't really return memory to the system,
>> > which means we risk increasing the memory footprint, and create
>> > system-wide memory pressure.  It isn't a catastrophe, but we should
>> > try to avoid it if possible.
>> 
>> Ok. I know very little about the internal storage of text in Emacs.
>> There is at least two strings with a gap at the current edit point; if
>> we pass a simple pointer to tree-sitter, it will have to handle the gap.
>
> Tree-sitter allows the application to define a "reader" function that
> it will then call to access buffer text.  That function should cope
> with the gap.

and also with the encoding, which you did not address. I don't see how
that is different from the C level buffer-substring. Certainly there
should be a module function buffer-substring that is as efficient as possible.

>> You mention "consing of Lisp objects" above, which says to me that the
>> text is stored in a more complex structure.
>
> I meant the consing that is necessary to make a buffer-substring that
> will be passed to the parser.

Since are are calling the parser from C (if it is linked into Emacs, or
in a module), I still don't understand. Does C code have to cons to
create a string? It will have to allocate if the requested range is not
contiguous in the buffer.

>> Avoid _all_ copying is impossible; the parser must store the contents of
>> each token in some way. Typically that is done by storing
>> pointers/indices into the text buffer that contains the entire text.
>
> I don't think tree-sitter does that, because the text it gets is
> ephemeral.  If we pass it a buffer-substring, it's a temporary string
> which will be GCed after it's used; if we pass it pointers to buffer
> text, those pointers can be invalid after GC, because GC can relocate
> buffer text to a different memory region.

Hmm.
https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code
says:

    Syntax nodes store their position in the source code both in terms
    of raw bytes and row/column coordinates

In the case of passing a pointer to a string (or buffer, etc), those
positions are relative to that original buffer. So the Emacs buffer is
serving as the parse buffer. Ok, that avoids any copying.

If we pass a buffer-substring to the parser, we are then responsible for
mapping positions relative to the substring into positions relative to
the full buffer. wisi delegates that to the parser; it can pass
start-char-pos and start-byte-pos to the parser along with a string.


>> >> In sum, the short answer is "yes, you must parse the whole file, unless
>> >> your language is particularly simple".
>> >
>> > Funny, my conclusion from reading your detailed description was
>> > entirely different.
>> 
>> I need more than that to respond in a helpful way.
>
> Well, you said:
>
>> To some extent, that depends on the language.
>
> and then went on to describing how each language might _not_ need a
> full parse in many cases.  Thus the conclusion sounded a bit radical
> to me.

Ok, we are putting different spins on what "particularly simple" means.

A more neutral phrasing would be:

    Some languages require parsing the whole file, some do not.
    
-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]