[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#60953: The :match predicate with large regexp in tree-sitter font-lo
From: |
Yuan Fu |
Subject: |
bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: |
Tue, 24 Jan 2023 19:13:29 -0800 |
> On Jan 23, 2023, at 8:04 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> Cc-ing Yuan, just in case.
>
> On 20/01/2023 05:53, Dmitry Gutov wrote:
>> In my benchmarking -- using this form in
>> test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:
>> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let
>> (treesit--font-lock-fast-mode) (font-lock-ensure))))
>> the rule added to its font-lock in commit d66ac5285f7
>> :language language
>> :feature 'builtin-functions
>> `((((identifier) @font-lock-builtin-face)
>> (:match ,ruby-ts--builtin-methods
>> @font-lock-builtin-face)))
>> ...seems to have made it 50% slower.
>> The profile looked like this:
>> 9454 84% - font-lock-fontify-region
>> 9454 84% - font-lock-default-fontify-region
>> 8862 79% - font-lock-fontify-syntactically-region
>> 8702 78% - treesit-font-lock-fontify-region
>> 128 1% treesit-fontify-with-override
>> 123 1% facep
>> 84 0% treesit--children-covering-range-recurse
>> 60 0% + ruby-ts--comment-font-lock
>> 4 0% + font-lock-unfontify-region
>> 568 5% + font-lock-fontify-keywords-region
>> 16 0% + font-lock-unfontify-region
>> So there's nothing on the Lisp level to look at.
>
> I've done some perf recordings now. It seems most/all of the difference comes
> down to garbage collection. Or more concretely, time spent inside
> process_mark_stack.
>
> Without the added query benchmark reports:
>
> (10.13723333 49 1.141649534999999)
>
> And the perf top5 is:
>
> 17.26% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_current_status
> 10.83% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_next_sibling
> 10.18% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_first_child
> 8.37% emacs emacs [.] process_mark_stack
> 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> With this simple query that colors everything:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)))
>
> I get:
>
> (11.993968995 82 1.9326509270000045)
>
> Note the jump in runtime that's larger than the jump in GC.
>
> 17.26% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_current_status
> 10.83% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_next_sibling
> 10.18% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_first_child
> 8.37% emacs emacs [.] process_mark_stack
> 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> The current query looks like this:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)
> (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))
>
> Benchmarking:
>
> (12.493614359 107 2.558609025999999)
>
> 15.30% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_current_status
> 14.92% emacs emacs [.] process_mark_stack
> 9.75% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_next_sibling
> 8.90% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_first_child
> 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> Here we get the same jump in runtime as in GC. Even though this rule ends up
> coloring much fewer (almost none) nodes in the current buffer. I interpret
> the results like this:
>
> - The jump in runtime of the previous query was probably related to the
> number of nodes needed to be processed, but not with the resulting
> highlighting, even though every identifier in the buffer ends up being
> colored.
>
> - The GC overhead created by the predicates is non-negligible.
>
> And the original query that I tried:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)
> (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))
>
> Benchmarking:
>
> (16.433451865000002 249 5.908674810000001)
>
> 23.72% emacs emacs [.] process_mark_stack
> 12.33% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_current_status
> 7.96% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_next_sibling
> 7.38% emacs libtree-sitter.so.0.0 [.]
> ts_tree_cursor_goto_first_child
> 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> Here's a significant jump in GC time which is almost the same as the
> difference in runtime. And all of it is spent marking?
>
> I suppose if the problem is allocation of a large string (many times over),
> the GC could be spending a lot of time scanning through the memory. Could
> this be avoided by passing some substitute handle to TS, instead of the full
> string? E.g. some kind of reference to it in the regexp cache.
>
FYI the predicates are not processed by tree-sitter, but by us. For example,
the #equal predicate is handled by treesit_predicate_equal. For #match, right
now we create a string with Fbuffer_substring and pass it to fast_string_match,
so it definitely causes a lot of gc’s, as you observed. We can probably match
the regexp in-place, just limit the match to the range of the node.
Yuan
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/19
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/23
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient,
Yuan Fu <=
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/24
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/25
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/25
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Yuan Fu, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26