|
From: | Dmitry Gutov |
Subject: | bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: | Thu, 26 Jan 2023 19:15:51 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 26/01/2023 10:10, Eli Zaretskii wrote:
From: Yuan Fu<casouri@gmail.com> Date: Wed, 25 Jan 2023 23:17:25 -0800 Cc: Dmitry Gutov<dgutov@yandex.ru>, 60953@debbugs.gnu.orgSwitching to using :pred with function (like I did in commit d94dc606a0934) which still uses buffer-substring inside is significantly faster.If the performance issue is fixed, then the only aspect that we should perhaps try to improve is consing. Consing a string each time you need to fontify increases the GC pressure, so if there's a good way of avoiding that without performance degradation, we should take it. Is it possible to use your :pred technique in a way that doesn't need to produce strings from buffer text?Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects.Yes, exactly my thoughts. Perhaps Dmitry could present comparison of profiles from perf which would allow us to understand the reason(s)?
I believe I did that in the second message in this thread: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8
To quote the specific profiles, it's 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 14.92% emacs emacs [.] process_mark_stack 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :pred vs. 23.72% emacs emacs [.] process_mark_stack 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :match. And to continue the quote: Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking? I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.
[Prev in Thread] | Current Thread | [Next in Thread] |