emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: modern regexes in emacs


From: Daniel Pittman
Subject: Re: modern regexes in emacs
Date: Wed, 27 Feb 2019 13:18:09 -0500

On Wed, Feb 27, 2019 at 8:53 AM Mattias Engdegård <address@hidden> wrote:
26 feb. 2019 kl. 15.33 skrev Andreas Schwab <address@hidden>:
>
> If you want to byte-compile a form that contains a regexp object, a
> proper read syntax is required.
>
> The object types without read syntax are rather ephemeral, unlikely to
> occur in byte-compiled forms.

Thanks for pointing that out. I'm not sure how it would work -- please bear with me.

Suppose we want to write (looking-at (pcre "a(b|c)")).
Then `pcre' is a macro returning a mutable object with the regexp in some canonical form -- a traditional Emacs regexp, perhaps, or normalised rx or something else. The object also has space for the internal compiled pattern, roughly struct re_pattern_buffer today.

As Richard pointed out, it is polite to make the object human-readable (for debugging, if nothing else). This means that we are either satisfied with the readability of the canonical form, or the original pattern is included around for this purpose.

As a somewhat outsider opinion, but based on helping a lot of junior developers get up to speed with a wide range of languages over many years, I like to imagine my suggestion here is useful.  Other languages express regex literals with the equivalent of a CL reader macro, or the record literal syntax #s(...):

Clojure: #"..." 
_javascript_ and many others: /.../
Racket: #rx"..." and #px"..." for basic and PCRE respectively.
Dart, and a few others: r"...", or r'...', or a tagged prefix such as $r"..." or %r/.../

Of those the most Emacs Lisp-ish would be something like the Racket versions for supporting both types, for example `#r"..."`, or `#pcre"..."`, or even `#rx(...)`.

I'd personally suggest that an additional reader (macro) syntax, and using that in the printed form, is the most user friendly option.  The S-_expression_ form is a little less friendly, but in my eyes the absolute best fallback, being a printed representation of `(pcre "...")` etc.  That works, but it doesn't give the "compiled _expression_" a distinct identity from the methods to create them, and I think separating them is the correct choice.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]