[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#60953: The :match predicate with large regexp in tree-sitter font-lo
From: |
Dmitry Gutov |
Subject: |
bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: |
Thu, 26 Jan 2023 20:07:30 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 26/01/2023 08:50, Eli Zaretskii wrote:
Date: Thu, 26 Jan 2023 01:21:08 +0200
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>
Thank you. Unfortunately, the performance improvement from this patch is
still fairly negligible.
This is quite strange, since all of the approaches basically use the
same primitives under the hood. Perhaps the reason for the slowness
is that the code which computes the text span of a node is slow?
That code seems to be the same between the two options:
treesit_predicate_capture_name_to_text basically does the same as
treesit-node-text (except in C) after iterating through a Lisp list to
find the node. ruby-ts--builtin-method-p calls treesit-node-text.
And treesit_predicate_pred does the same iteration, so the :pred option
should just be slower, due to Lisp-related overhead. funcalls and stuff.
Otherwise, I must be missing something here, since the rest of the
code on the C level is basically the same, give or take some wrappers
that should not change the overall picture.
The query object is smaller, though. That's basically my only remaining
hypothesis.
Switching to using :pred with function (like I did in commit
d94dc606a0934) which still uses buffer-substring inside is significantly
faster.
If the performance issue is fixed, then the only aspect that we should
perhaps try to improve is consing.
I wouldn't say it's "fixed", just improved. And :match really should be
able to be made faster than :pred, since it'll probably be used for
similar cases (where a lot/most of nodes match).
There seems to be a fair amount of consing going on inside
treesit-query-capture already: we wrap every TS node in our objects, we
turn the captured nodes into a Lisp alist, and we turn the predicates
into a list, turning the strings into "our" strings. The 'make_string'
function creates a new copy in the memory, right?
One could hope to avoid recreating the list of predicates on every
match, but that seems to be a limitation of the TS API:
ts_query_predicates_for_pattern requires a second argument,
match.pattern_index. Maybe we could memoize that, though?
In any case, that seems to explain why adding or avoiding one
buffer-substring call per match isn't moving the needle very much.
Consing a string each time you
need to fontify increases the GC pressure, so if there's a good way of
avoiding that without performance degradation, we should take it. Is
it possible to use your :pred technique in a way that doesn't need to
produce strings from buffer text?
The only version I managed to get some (very minor) performance
improvement is this:
(defun ruby-ts--builtin-method-p (node)
(goto-char (treesit-node-start node))
(let ((inhibit-changing-match-data t))
(re-search-forward ruby-ts--builtin-methods (treesit-node-end node)
t)))
The improvement is like 200-300ms, whereas the difference between :match
and :pred in this benchmark is several seconds.
And if I try to bring it back to 100% correctness, to ensure that the
whole of node text is matched, I have to use narrowing (and string-start
and string-end anchors in regexp):
(defvar ruby-ts--builtin-methods
(format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
ruby-builtin-methods-with-reqs)))
"Ruby built-in methods.")
(defun ruby-ts--builtin-method-p (node)
(save-restriction
(goto-char (treesit-node-start node))
(narrow-to-region (point) (treesit-node-end node))
(let ((inhibit-changing-match-data t))
(re-search-forward ruby-ts--builtin-methods nil t))))
And with that, the performance is again no better than the current
version. If I also add save-excursion, it's worse.
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, (continued)
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/25
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/25
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Yuan Fu, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Eli Zaretskii, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient,
Dmitry Gutov <=
- bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient, Dmitry Gutov, 2023/01/26