bug#60953: The :match predicate with large regexp in tree-sitter font-lo

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo

From:	Dmitry Gutov
Subject:	bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date:	Tue, 24 Jan 2023 06:04:07 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Cc-ing Yuan, just in case.

On 20/01/2023 05:53, Dmitry Gutov wrote:

In my benchmarking -- using this form intest/lisp/progmodes/ruby-mode-resources/ruby.rb after enablingruby-ts-mode:

(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1)(let (treesit--font-lock-fast-mode) (font-lock-ensure))))


the rule added to its font-lock in commit d66ac5285f7

    :language language
    :feature 'builtin-functions
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods
        @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

   9454  84%                   - font-lock-fontify-region
   9454  84%                    - font-lock-default-fontify-region
   8862  79%                     - font-lock-fontify-syntactically-region
   8702  78%                      - treesit-font-lock-fontify-region
    128   1%                         treesit-fontify-with-override
    123   1%                         facep
     84   0% treesit--children-covering-range-recurse
     60   0%                       + ruby-ts--comment-font-lock
      4   0%                       + font-lock-unfontify-region
    568   5%                     + font-lock-fontify-keywords-region
     16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

I've done some perf recordings now. It seems most/all of the differencecomes down to garbage collection. Or more concretely, time spent insideprocess_mark_stack.


Without the added query benchmark reports:

(10.13723333 49 1.141649534999999)

And the perf top5 is:

17.26% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_current_status10.83% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_next_sibling10.18% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_first_child

   8.37%  emacs         emacs                       [.] process_mark_stack
   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

With this simple query that colors everything:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)))

I get:

(11.993968995 82 1.9326509270000045)

Note the jump in runtime that's larger than the jump in GC.

   8.37%  emacs         emacs                       [.] process_mark_stack
   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

The current query looks like this:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)
      (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))

Benchmarking:

(12.493614359 107 2.558609025999999)

15.30% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_current_status

  14.92%  emacs         emacs                       [.] process_mark_stack

9.75% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_next_sibling8.90% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_first_child

   3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

Here we get the same jump in runtime as in GC. Even though this ruleends up coloring much fewer (almost none) nodes in the current buffer. Iinterpret the results like this:

- The jump in runtime of the previous query was probably related to thenumber of nodes needed to be processed, but not with the resultinghighlighting, even though every identifier in the buffer ends up beingcolored.


- The GC overhead created by the predicates is non-negligible.

And the original query that I tried:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)
      (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))

Benchmarking:

(16.433451865000002 249 5.908674810000001)

  23.72%  emacs         emacs                    [.] process_mark_stack

12.33% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_current_status7.96% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_next_sibling7.38% emacs libtree-sitter.so.0.0 [.]ts_tree_cursor_goto_first_child

   3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

Here's a significant jump in GC time which is almost the same as thedifference in runtime. And all of it is spent marking?

I suppose if the problem is allocation of a large string (many timesover), the GC could be spending a lot of time scanning through thememory. Could this be avoided by passing some substitute handle to TS,instead of the full string? E.g. some kind of reference to it in theregexp cache.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/19
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov <=
  - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Yuan Fu, 2023/01/24
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/24
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/25
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/25
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Yuan Fu, 2023/01/26
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
    - bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26

Prev by Date: bug#60983: 29.0.60; Tree-sitter user-level control
Next by Date: bug#61022: 28.2.50; Mouse tracking of high coordinates not working in rxvt-unicode
Previous by thread: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Next by thread: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Index(es):
- Date
- Thread