bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 23:26:54 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 26/01/2023 22:01, Eli Zaretskii wrote:
Date: Thu, 26 Jan 2023 21:35:55 +0200
Cc:casouri@gmail.com,60953@debbugs.gnu.org
From: Dmitry Gutov<dgutov@yandex.ru>

If you are saying that GC is responsible, then running the benchmark
with gc-cons-threshold set to most-positive-fixnum should produce a
more interesting profile and perhaps a more interesting comparison.
That really helps:

(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let
(treesit--font-lock-fast-mode) (font-lock-ensure))))

=> (16.078430587 251 5.784299419999996)

(let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000
(progn (font-lock-mode -1) (font-lock-mode 1) (let
(treesit--font-lock-fast-mode) (font-lock-ensure)))))

=> (10.369389725 0 0.0)

Do you want a perf profile for the latter? It might not be very useful.
I'd be interested in comparing the profiles of the two techniques, the
:pred and the :match, with GC disabled like that.

Curiously, :pred is still faster, but the difference is much smaller:

pred:

(9.212951344 0 0.0)

18.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 11.61% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.43% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   5.00%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
4.02% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node
   3.97%  emacs         emacs                       [.] re_match_2_internal
3.36% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata 2.45% emacs emacs [.] parse_str_as_multibyte
   1.95%  emacs         emacs                       [.] exec_byte_code
1.66% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node
   1.66%  emacs         libtree-sitter.so.0.0       [.] ts_node_end_point
   1.30%  emacs         emacs                       [.] allocate_vectorlike
   1.24%  emacs         emacs                       [.] find_interval

match:

(10.059083317 0 0.0)

19.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 12.41% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.22% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   5.21%  emacs         libtree-sitter.so.0.0  [.] ts_node_start_point
   4.22%  emacs         emacs                  [.] re_match_2_internal
3.97% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.64% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata
   2.36%  emacs         emacs                  [.] exec_byte_code
   1.66%  emacs         libtree-sitter.so.0.0  [.] ts_node_end_point
1.62% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node
   1.34%  emacs         libtree-sitter.so.0.0  [.] ts_node_end_byte
   1.28%  emacs         emacs                  [.] allocate_vectorlike
0.95% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_parent

This is with the current code and disabled GC. No additional changes to treesit.c.

(But I thought you concluded that GC alone cannot explain the
difference in performance?)
I'm inclined to think the difference is related to copying of the regexp
string, but whether the time is spent in actually copying it, or
scanning its copies for garbage later, it was harder to say. Seems like
it's the latter, though.
If we can avoid the copying, I think it's desirable in any case.  They
are constant regexps, aren't they?

Yes, but how?

Memoization is one possible step, but then we only avoid re-creating the predicate structures for each match. We still send a pretty large query and, apparently, get it back..? Might be some copying involved there.

TBH the moderate success the memoization patch shows has me stumped.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]