[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Add some aliases for re-related functions

From: Dmitry Gutov
Subject: Re: Add some aliases for re-related functions
Date: Mon, 4 May 2020 03:29:02 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 03.05.2020 06:55, Stefan Monnier wrote:
Think of it as the case where the regexp starts with \` and ends with \'
Then there's the relaxation of "finding the longest match" (what we
call `looking-at`) and then "finding the leftmost longest match" (what
we call `string-match`).

looking-at being a special case of re-search-forward, I take?

Not sure what you mean by that.

More or less that (looking-at "abc") is basically the same as (re-search-forward "\\=abc"). Though maybe not internally.

`re-search-forward` is in the same
category as `string-match`, i.e. it's a *search* operation, whereas
`looking-at` is a *match* operation.

IOW a "match" operation is like a "search" but where `match-beginning`.

Algorithmically, the two are close, but there's a bit more work to go
from one to the other than meets the eye:

Thanks for the explanation.

If you take a regexp and turn it into a DFA in the usual way, you get an
automaton that can trivially (and in O(n) time) give you either the
shortest match or the longest match.  But if you want it to search, you
have to add a loop around it to try matching at every possible start
position, which brings the complexity to O(n²) :-(

To fix that you can try and compile ".*RE" instead of "RE" and that will
give you an automaton that can do the search or "RE" in O(n) time, but
it won't directly give you the "leftmost longest match" (instead it can
directly give you "the match whose match-end is closest" and "the match
whose match-end is furthest").

But that's what we generally want in practice anyway. And in the cases where the desired out come is different, the regexp is probably ambiguous, which often means worse performance.

Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and
use the second option: our engine basically only does "matching" and to
get the search behavior we add a loop around it, so algorithmically,
`looking-at` and `string-match/re-search-forward` are quite different.

Notice that we don't really have the equivalent of `looking-at`
on strings currently :-(

Those two have traditionally be named `re_match` and `re_search`
respectively in C libraries (as can be seen in `src/regexp-emacs.c`).
Yes, ok. But we also need names to distinguish that things happen in
a buffer. So far we've used 'search' for those.
Using the term 'search' for matching in strings might be a significant
change, given existing expectations.

Yes, it's unfortunate.  Maybe we could/should merge them to clarify:

     (re-match REGEXP &optional STRING LIMIT START)
     (re-search REGEXP &optional STRING LIMIT START)

would be like `looking-at` but would operate on STRING instead of
`current-buffer` if STRING is non-nil.  START defaults to point for
current-buffer and 0 for a string.

re-search-forward also moves point, whereas string-match returns the index of the match start. I think it would be kinda ugly to choose among these behaviors based on the second argument. And if it returns point instead, without moving, we get an entirely different function now.

I'm not sure it's worth the changeover and retraining everybody if the main benefit is being more aware of the underlying algorithm complexity. After all, it's not too hard to make an educated guess that looking-at is (or at least can be) faster than re-search-forward.

PS: BTW, `looking-back` doesn't do a "match" of the "longest match that
ends at point" but a "search" for the "rightmost longest match that ends
at point" since it uses `re-search-backward` internally.
It's a weird function, I agree. Though it's proved to be a handy one.

Yes.  The functionality it offers is important, but in reality one would
want a "real" `looking-back` which uses a backward match, rather than
the current "backward search for a forward match" hack.  It would be
both more efficient and provide a cleaner behavior.

It suppose so. Yet, in all cases I had to rewrite looking-back calls to add the now-mandatory argument, the resulting time it took was fast enough to get lost in the measurement noise.

Maybe an optimized version could enable some new use cases, though.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]