|
From: | Dmitry Gutov |
Subject: | bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: | Mon, 30 Jan 2023 20:20:46 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 30/01/2023 19:49, Eli Zaretskii wrote:
Date: Mon, 30 Jan 2023 19:15:07 +0200 Cc: casouri@gmail.com, 60953@debbugs.gnu.org From: Dmitry Gutov <dgutov@yandex.ru>fast_looking_at already does an anchored match, so I'm not sure I follow. I don't even understand why you need th \` part, when the match will either always start from the first position or fail.The regexp might include the anchors, or it might not. It might also use a different anchor like ^ or $ or \b.OK, but it always goes only forward, so narrowing to the beginningshouldn't be necessary. Right?
Are you saying that fast_looking_at ("\\`", ...) will always succeed? And fast_looking_at ("^", ...), etc.I would imagine that only fast_looking_at ("\\=", ...) is guaranteed to succeed.
And you can use the LIMIT argument to limit how far it goes forward, right? So once again, why narrow?
I tried to explain that there is a certain expectation (on the part of the user/programmer) which anchors are allowed in the :match regexp, and what their effects are, and those seem hard to support without narrowing.
And for \', just compare the length of the match returned by fast_looking_at with the length of the text.This seems to work, i.e. even when before "carpet", (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) (match-string 0)) returns the full match. I was expecting that it could return just "car" -- not sure why it doesn't stop there.Because regex search is greedy?
Cool. TIL, thanks. That's not going to help here, but might in other situations when my code controls the regexp as well.
One possible alternative, I suppose, would be to create a raw pointer to a part of the buffer text and call re_search directly specifying the known length of the node in bytes. If buffer text is one contiguous region in memory, that is.It isn't, though: there's the gap. Which is why doing this is not recommended; instead, use something like search_buffer_re, which already handles this complication for you. (Except that search_buffer_re is a static function, so only code in search.c can use it. So you'd need to make it non-static.)
Interesting. Does search_buffer_re match the \` anchor at POS and \' at LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?
[Prev in Thread] | Current Thread | [Next in Thread] |