emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E


From: Eli Zaretskii
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Wed, 01 Apr 2020 22:33:05 +0300

> From: Tuấn-Anh Nguyễn <address@hidden>
> Date: Thu, 2 Apr 2020 00:55:45 +0700
> Cc: address@hidden
> 
> > Did you consider using the API where an application can provide a
> > function to return text at a given offset?  Such a function could be
> > relatively easily implemented for Emacs.
> >
> 
> I don't understand what you mean. Below I'll explain how it works
> currently.  [...]  If dynamic modules have direct access to the
> buffer text, none of the above is an issue.
> 
> Such direct access can be enabled by something like this:
> 
>     char* (*access_buffer_text) (emacs_env *env,
>                                  emacs_value buffer,
>                                  ptrdiff_t byte_offset,
>                                  ptrdiff_t *size_inout);
> 
> Of course, such an API would require extensive documentation on how it
> must be used, to ensure safety and correctness.

I think you are moving too fast, and keep the current implementation
in sight too much.

What I suggest is to step back and see how such direct access, if it
were available, could be used with tree-sitter.  Let's forget about
modules for a moment and consider tree-sitter linked with Emacs and
capable of calling any C function in core.  How would you use that?

Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
question to answer is what to do with byte sequences that are not
valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
invalid byte sequences in general?

Also, direct access to buffer text generally means we must make sure
GC never runs as long as pointers to buffer text are lying around.
Can any Lisp run between calls to the reader function that the
tree-sitter parser calls to access the buffer text?  If so, we need to
take care of that issue.

Next, I'm still asking whether parsing the whole buffer when it is
first created is necessary.  Can we pass to the parser just a small
chunk (say, 500 bytes) of the buffer around the window-full to be
displayed next?  If this presents problems, what are those problems?

IOW, the issue with exposing access to buffer text to modules is IMO
secondary.  My suggestion is first to figure out how to do this stuff
efficiently from within Emacs itself, as if the module interface were
not part of the equation.  We can add that aspect back later.

And yes, doing this by consing strings is not a good idea, it will
slow things down and cause a lot of GC.  It is best avoided.  Thus my
questions above.

> > Btw, what do you do with the tree returned by the tree-sitter parser?
> > store it in some buffer-local variable?  If so, how much memory does
> > such a tree take, and when, if ever, is that memory released?
> >
> 
> It's stored in a buffer-local variable. I haven't measured the memory
> they take. Memory is released when the tree object is garbage-collected
> (it's a `user-ptr').

So if I have many hundreds of buffers, I could have such a tree in
each one of them indefinitely?  Perhaps that's one more design issue
to consider, given that the parsing is so fast.  Similar to what we do
with image and face caches -- we flush them from time to time, to keep
the memory footprint in check.  So a buffer that was not current more
than some time interval ago could have its tree GCed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]