bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Mon, 30 Jan 2023 20:20:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 30/01/2023 19:49, Eli Zaretskii wrote:
Date: Mon, 30 Jan 2023 19:15:07 +0200
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>

fast_looking_at already does an anchored match, so I'm not sure I
follow.  I don't even understand why you need th \` part, when the
match will either always start from the first position or fail.

The regexp might include the anchors, or it might not.

It might also use a different anchor like ^ or $ or \b.

OK, but it always goes only forward, so narrowing to the beginning
shouldn't be necessary. Right?

Are you saying that fast_looking_at ("\\`", ...) will always succeed?

And fast_looking_at ("^", ...), etc.

I would imagine that only fast_looking_at ("\\=", ...) is guaranteed to succeed.

And you can use the LIMIT argument to
limit how far it goes forward, right?  So once again, why narrow?

I tried to explain that there is a certain expectation (on the part of the user/programmer) which anchors are allowed in the :match regexp, and what their effects are, and those seem hard to support without narrowing.

And for \', just compare the length of the match returned by
fast_looking_at with the length of the text.

This seems to work, i.e. even when before "carpet",

(and (looking-at (regexp-opt '("car" "cardigan" "carpet")))
       (match-string 0))

returns the full match. I was expecting that it could return just "car"
-- not sure why it doesn't stop there.

Because regex search is greedy?

Cool. TIL, thanks. That's not going to help here, but might in other situations when my code controls the regexp as well.

One possible alternative, I suppose, would be to create a raw pointer to
a part of the buffer text and call re_search directly specifying the
known length of the node in bytes. If buffer text is one contiguous
region in memory, that is.

It isn't, though: there's the gap.  Which is why doing this is not
recommended; instead, use something like search_buffer_re, which
already handles this complication for you.  (Except that
search_buffer_re is a static function, so only code in search.c can
use it.  So you'd need to make it non-static.)

Interesting. Does search_buffer_re match the \` anchor at POS and \' at LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]