bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Tue, 24 Jan 2023 06:04:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Cc-ing Yuan, just in case.

On 20/01/2023 05:53, Dmitry Gutov wrote:
In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:

  (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))

the rule added to its font-lock in commit d66ac5285f7

    :language language
    :feature 'builtin-functions
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods
        @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

   9454  84%                   - font-lock-fontify-region
   9454  84%                    - font-lock-default-fontify-region
   8862  79%                     - font-lock-fontify-syntactically-region
   8702  78%                      - treesit-font-lock-fontify-region
    128   1%                         treesit-fontify-with-override
    123   1%                         facep
     84   0% treesit--children-covering-range-recurse
     60   0%                       + ruby-ts--comment-font-lock
      4   0%                       + font-lock-unfontify-region
    568   5%                     + font-lock-fontify-keywords-region
     16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

I've done some perf recordings now. It seems most/all of the difference comes down to garbage collection. Or more concretely, time spent inside process_mark_stack.

Without the added query benchmark reports:

(10.13723333 49 1.141649534999999)

And the perf top5 is:

17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   8.37%  emacs         emacs                       [.] process_mark_stack
   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

With this simple query that colors everything:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)))

I get:

(11.993968995 82 1.9326509270000045)

Note the jump in runtime that's larger than the jump in GC.

17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   8.37%  emacs         emacs                       [.] process_mark_stack
   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

The current query looks like this:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)
      (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))

Benchmarking:

(12.493614359 107 2.558609025999999)

15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status
  14.92%  emacs         emacs                       [.] process_mark_stack
9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

Here we get the same jump in runtime as in GC. Even though this rule ends up coloring much fewer (almost none) nodes in the current buffer. I interpret the results like this:

- The jump in runtime of the previous query was probably related to the number of nodes needed to be processed, but not with the resulting highlighting, even though every identifier in the buffer ends up being colored.

- The GC overhead created by the predicates is non-negligible.

And the original query that I tried:

   :language language
   :feature 'builtin-function
   `((((identifier) @font-lock-builtin-face)
      (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))

Benchmarking:

(16.433451865000002 249 5.908674810000001)

  23.72%  emacs         emacs                    [.] process_mark_stack
12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
   3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking?

I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]