bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 19:15:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 26/01/2023 10:10, Eli Zaretskii wrote:
From: Yuan Fu<casouri@gmail.com>
Date: Wed, 25 Jan 2023 23:17:25 -0800
Cc: Dmitry Gutov<dgutov@yandex.ru>,
  60953@debbugs.gnu.org

Switching to using :pred with function (like I did in commit
d94dc606a0934) which still uses buffer-substring inside is significantly
faster.
If the performance issue is fixed, then the only aspect that we should
perhaps try to improve is consing.  Consing a string each time you
need to fontify increases the GC pressure, so if there's a good way of
avoiding that without performance degradation, we should take it.  Is
it possible to use your :pred technique in a way that doesn't need to
produce strings from buffer text?
Why is :pred more performant though? They just use string-match-p. If anything, 
the :pred predicates should be more expensive, since they execute lisp 
functions and conses tree-sitter nodes into lisp objects.
Yes, exactly my thoughts.

Perhaps Dmitry could present comparison of profiles from perf which
would allow us to understand the reason(s)?

I believe I did that in the second message in this thread: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8

To quote the specific profiles, it's

  15.30%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_current_status
  14.92%  emacs         emacs                       [.] process_mark_stack
   9.75%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_next_sibling
   8.90%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_first_child
   3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

for :pred vs.

  23.72%  emacs         emacs                    [.] process_mark_stack
  12.33%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_current_status
   7.96%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_next_sibling
   7.38%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_first_child
   3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

for :match.

And to continue the quote:

  Here's a significant jump in GC time which is almost the same as the
  difference in runtime. And all of it is spent marking?

  I suppose if the problem is allocation of a large string (many times
  over), the GC could be spending a lot of time scanning through the
  memory. Could this be avoided by passing some substitute handle to TS,
  instead of the full string? E.g. some kind of reference to it in the
  regexp cache.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]