bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 20:07:30 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 26/01/2023 08:50, Eli Zaretskii wrote:
Date: Thu, 26 Jan 2023 01:21:08 +0200
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>

Thank you. Unfortunately, the performance improvement from this patch is
still fairly negligible.

This is quite strange, since all of the approaches basically use the
same primitives under the hood.  Perhaps the reason for the slowness
is that the code which computes the text span of a node is slow?

That code seems to be the same between the two options: treesit_predicate_capture_name_to_text basically does the same as treesit-node-text (except in C) after iterating through a Lisp list to find the node. ruby-ts--builtin-method-p calls treesit-node-text.

And treesit_predicate_pred does the same iteration, so the :pred option should just be slower, due to Lisp-related overhead. funcalls and stuff.

Otherwise, I must be missing something here, since the rest of the
code on the C level is basically the same, give or take some wrappers
that should not change the overall picture.

The query object is smaller, though. That's basically my only remaining hypothesis.

Switching to using :pred with function (like I did in commit
d94dc606a0934) which still uses buffer-substring inside is significantly
faster.

If the performance issue is fixed, then the only aspect that we should
perhaps try to improve is consing.

I wouldn't say it's "fixed", just improved. And :match really should be able to be made faster than :pred, since it'll probably be used for similar cases (where a lot/most of nodes match).

There seems to be a fair amount of consing going on inside treesit-query-capture already: we wrap every TS node in our objects, we turn the captured nodes into a Lisp alist, and we turn the predicates into a list, turning the strings into "our" strings. The 'make_string' function creates a new copy in the memory, right?

One could hope to avoid recreating the list of predicates on every match, but that seems to be a limitation of the TS API: ts_query_predicates_for_pattern requires a second argument, match.pattern_index. Maybe we could memoize that, though?

In any case, that seems to explain why adding or avoiding one buffer-substring call per match isn't moving the needle very much.

Consing a string each time you
need to fontify increases the GC pressure, so if there's a good way of
avoiding that without performance degradation, we should take it.  Is
it possible to use your :pred technique in a way that doesn't need to
produce strings from buffer text?

The only version I managed to get some (very minor) performance improvement is this:

(defun ruby-ts--builtin-method-p (node)
  (goto-char (treesit-node-start node))
  (let ((inhibit-changing-match-data t))
(re-search-forward ruby-ts--builtin-methods (treesit-node-end node) t)))

The improvement is like 200-300ms, whereas the difference between :match and :pred in this benchmark is several seconds.

And if I try to bring it back to 100% correctness, to ensure that the whole of node text is matched, I have to use narrowing (and string-start and string-end anchors in regexp):

(defvar ruby-ts--builtin-methods
  (format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
                                         ruby-builtin-methods-with-reqs)))
  "Ruby built-in methods.")

(defun ruby-ts--builtin-method-p (node)
  (save-restriction
    (goto-char (treesit-node-start node))
    (narrow-to-region (point) (treesit-node-end node))
    (let ((inhibit-changing-match-data t))
      (re-search-forward ruby-ts--builtin-methods nil t))))

And with that, the performance is again no better than the current version. If I also add save-excursion, it's worse.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]