[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ugly regexps

From: Alan Mackenzie
Subject: Re: Ugly regexps
Date: Wed, 3 Mar 2021 20:46:12 +0000

Hello, Stefan.

On Tue, Mar 02, 2021 at 19:32:23 -0600, Stefan Kangas wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > BTW, while this theme of ugly regexps keeps coming up, how 'bout we add
> > a new function `ere` which converts between the ERE style of regexps
> > where grouping parens are not escaped (and plain chars meant to match
> > an actual paren need to be escaped instead) to ELisp-style regexps?

> > So you can do

> >     (string-match (ere "\\(def(macro|un|subst) .{1,}"))

> > instead of

> >     (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}")

> > ?

> Sounds good to me.

> I was going to ask why not just do PCRE, but then I realized I'm not
> exactly sure what the syntactical differences are.  (We obviously lack
> some features.)  AFAIR, Emacs regexps don't exactly match GNU grep,
> egrep, Perl, or anything else really.

These things don't exactly match eachother, do they?

> So I cranked out my dusty old copy of Mastering Regular Expressions and
> found this overview:

>     grep           egrep          Emacs          Perl
>     \? \+ \|      ? + |          ? + \|         ? + |
>     \( \)          ( )            \( \)          ( )
>                   \< \>         \< \> \b \B   \b \B

>     (Excerpt from Mastering Regular Expressions: Table 3-3: A (Very)
>     Superficial Look at the Flavor of a Few Common Tools)

> This shows the differences that most commonly bites you, in my
> experience.

The "biting" effect is surely small.  I have little difficulty using
grep, egrep and awk, all of whose regexp notations differ somewhat.

> While we're at it, has it ever been discussed to add support for the
> pcre library side-by-side with our homegrown regexp.c?  It would give us
> sane (standard) syntax and some useful features "for free"
> (e.g. lookaround).  I didn't test but a priori I would also assume the
> code to be much more performant than anything we could ever cook up
> ourselves.  It is used by several high-profile projects.

> I would imagine we'd introduce entirely new function names for it.
> Perhaps even a completely new and improved API like Lars suggested a
> while back.

No, No, No, No!

All these tools have one overarching thing in common, and that is they
each have a single variety of regexp.  That is, with the exception of
Emacs, which also has a radically different source form, namely rx.
Somebody pointed out the relatively small use of rx, and the same might
happen for a new regexp notation.  Or it might not, and we'd have two
different notations side by side.  This is surely something to avoid.

There's not a lot wrong with Emacs's regexp notation.  It works, works
well, and we're all familiar with it.  And there are many thousands of
lines of lisp containing regexps, all of which are in the same variety.
With the exception of those written with rx.

To introduce a second (string) variety alongside Emacs regexps would
cause confusion, and suck up effort better used for productive work.
Just how is one meant to search for a regexp using grep, when one
doesn't even know whether it follows Emacs conventions or some foreign
set of conventions?

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]