emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Add some aliases for re-related functions


From: Stefan Monnier
Subject: Re: Add some aliases for re-related functions
Date: Sat, 02 May 2020 23:55:08 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

>> Think of it as the case where the regexp starts with \` and ends with \'
>> Then there's the relaxation of "finding the longest match" (what we
>> call `looking-at`) and then "finding the leftmost longest match" (what
>> we call `string-match`).
>
> looking-at being a special case of re-search-forward, I take?

Not sure what you mean by that.  `re-search-forward` is in the same
category as `string-match`, i.e. it's a *search* operation, whereas
`looking-at` is a *match* operation.

IOW a "match" operation is like a "search" but where `match-beginning`.

Algorithmically, the two are close, but there's a bit more work to go
from one to the other than meets the eye:

If you take a regexp and turn it into a DFA in the usual way, you get an
automaton that can trivially (and in O(n) time) give you either the
shortest match or the longest match.  But if you want it to search, you
have to add a loop around it to try matching at every possible start
position, which brings the complexity to O(n²) :-(

To fix that you can try and compile ".*RE" instead of "RE" and that will
give you an automaton that can do the search or "RE" in O(n) time, but
it won't directly give you the "leftmost longest match" (instead it can
directly give you "the match whose match-end is closest" and "the match
whose match-end is furthest").  So to get the desired "leftmost longest
match", you have to work a bit harder yet.

Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and
use the second option: our engine basically only does "matching" and to
get the search behavior we add a loop around it, so algorithmically,
`looking-at` and `string-match/re-search-forward` are quite different.

Notice that we don't really have the equivalent of `looking-at`
on strings currently :-(

>> Those two have traditionally be named `re_match` and `re_search`
>> respectively in C libraries (as can be seen in `src/regexp-emacs.c`).
> Yes, ok. But we also need names to distinguish that things happen in
> a buffer. So far we've used 'search' for those.
> Using the term 'search' for matching in strings might be a significant
> change, given existing expectations.

Yes, it's unfortunate.  Maybe we could/should merge them to clarify:

    (re-match REGEXP &optional STRING LIMIT START)
    (re-search REGEXP &optional STRING LIMIT START)

would be like `looking-at` but would operate on STRING instead of
`current-buffer` if STRING is non-nil.  START defaults to point for
current-buffer and 0 for a string.

Compared to re-search-forward, this lacks the NOERROR and the COUNT args.
We could add yet more optional args, but this is getting ugly.  Not sure
how important these are, tho.
Or maybe we could change the optional START arg so it means "START" when used on
a string and it means something else (NOERROR or COUNT) when used in
a buffer (yucky)?

>> PS: BTW, `looking-back` doesn't do a "match" of the "longest match that
>> ends at point" but a "search" for the "rightmost longest match that ends
>> at point" since it uses `re-search-backward` internally.
> It's a weird function, I agree. Though it's proved to be a handy one.

Yes.  The functionality it offers is important, but in reality one would
want a "real" `looking-back` which uses a backward match, rather than
the current "backward search for a forward match" hack.  It would be
both more efficient and provide a cleaner behavior.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]