bug#60953: The :match predicate with large regexp in tree-sitter font-lo

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo

From:	Dmitry Gutov
Subject:	bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date:	Thu, 26 Jan 2023 20:07:30 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 26/01/2023 08:50, Eli Zaretskii wrote:

Date: Thu, 26 Jan 2023 01:21:08 +0200
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>

Thank you. Unfortunately, the performance improvement from this patch is
still fairly negligible.


This is quite strange, since all of the approaches basically use the
same primitives under the hood.  Perhaps the reason for the slowness
is that the code which computes the text span of a node is slow?

That code seems to be the same between the two options:treesit_predicate_capture_name_to_text basically does the same astreesit-node-text (except in C) after iterating through a Lisp list tofind the node. ruby-ts--builtin-method-p calls treesit-node-text.

And treesit_predicate_pred does the same iteration, so the :pred optionshould just be slower, due to Lisp-related overhead. funcalls and stuff.

Otherwise, I must be missing something here, since the rest of the
code on the C level is basically the same, give or take some wrappers
that should not change the overall picture.

The query object is smaller, though. That's basically my only remaininghypothesis.

Switching to using :pred with function (like I did in commit
d94dc606a0934) which still uses buffer-substring inside is significantly
faster.


If the performance issue is fixed, then the only aspect that we should
perhaps try to improve is consing.

I wouldn't say it's "fixed", just improved. And :match really should beable to be made faster than :pred, since it'll probably be used forsimilar cases (where a lot/most of nodes match).

There seems to be a fair amount of consing going on insidetreesit-query-capture already: we wrap every TS node in our objects, weturn the captured nodes into a Lisp alist, and we turn the predicatesinto a list, turning the strings into "our" strings. The 'make_string'function creates a new copy in the memory, right?

One could hope to avoid recreating the list of predicates on everymatch, but that seems to be a limitation of the TS API:ts_query_predicates_for_pattern requires a second argument,match.pattern_index. Maybe we could memoize that, though?

In any case, that seems to explain why adding or avoiding onebuffer-substring call per match isn't moving the needle very much.

Consing a string each time you
need to fontify increases the GC pressure, so if there's a good way of
avoiding that without performance degradation, we should take it.  Is
it possible to use your :pred technique in a way that doesn't need to
produce strings from buffer text?

The only version I managed to get some (very minor) performanceimprovement is this:


(defun ruby-ts--builtin-method-p (node)
  (goto-char (treesit-node-start node))
  (let ((inhibit-changing-match-data t))

(re-search-forward ruby-ts--builtin-methods (treesit-node-end node)t)))

The improvement is like 200-300ms, whereas the difference between :matchand :pred in this benchmark is several seconds.

And if I try to bring it back to 100% correctness, to ensure that thewhole of node text is matched, I have to use narrowing (and string-startand string-end anchors in regexp):


(defvar ruby-ts--builtin-methods
  (format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
                                         ruby-builtin-methods-with-reqs)))
  "Ruby built-in methods.")

(defun ruby-ts--builtin-method-p (node)
  (save-restriction
    (goto-char (treesit-node-start node))
    (narrow-to-region (point) (treesit-node-end node))
    (let ((inhibit-changing-match-data t))
      (re-search-forward ruby-ts--builtin-methods nil t))))

And with that, the performance is again no better than the currentversion. If I also add save-excursion, it's worse.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, (continued)

Prev by Date: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Next by Date: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Previous by thread: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Next by thread: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Index(es):
- Date
- Thread