emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Make regexp handling more regular


From: Lars Ingebrigtsen
Subject: Re: Make regexp handling more regular
Date: Thu, 03 Dec 2020 09:31:56 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Stefan Kangas <stefankangas@gmail.com> writes:

> I like the idea of adding an entirely new built-in API based on the
> current state of the art.  I would begin such a project by looking into
> what other Lisps are doing, such as CL, Clojure, Guile and Racket.  Why
> shouldn't Emacs Lisp be best-in-class?

Sure.

Common Lisp doesn't have regexps, but (some) implementations do, and
there's a bunch of libraries, like http://edicl.github.io/cl-ppcre/
I'm not much in favour:

* (scan "(a)*b" "xaaabd")
1
5
#(3)
#(4)

* (let ((s (create-scanner "(([a-c])+)x")))
    (scan s "abcxy"))
0
4
#(0 2)
#(3 3)

And since it's Common Lisp, of course you have special forms for
destructing: 

* (register-groups-bind (first second third fourth)
      ("((a)|(b)|(c))+" "abababc" :sharedp t)
    (list first second third fourth))
("c" "a" "b" "c")

Guile: https://www.gnu.org/software/guile/manual/html_node/Regexp-Functions.html

(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
⇒ #("blah2002" (4 . 8))

(map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
⇒ ("abc" "def")

Clojure: https://purelyfunctional.tv/mini-guide/regexes-in-clojure/

(re-matches #"abc(.*)" "abcxyz")
   ["abcxyz" "xyz"]

I.e., if there's one match, we return the match substring, otherwise an
array.  It's nice in one way, but the cleverness leads to errors when
(re-)writing code.

(subs (re-matches #"[a-z]+" "fooo baar") 3)

but then you add some more and you have to rewrite to something like:

(let [[_ s1 s2] (re-matches #"([a-z]+) ([a-z]+)" full-name)]
  (subs s1 3))

I hate that.

The thing that makes looking at other languages here slightly less
useful is that Emacs has buffers.  We're often not interested in the
(sub-)matches themselves at all, but instead their buffer positions
(i.e., match-beginning/end).

> As for naming, how about just using a short prefix such as "re-"?
> AFAICT, we currently have only five functions using that prefix.

Sure.

> Tangentially, I have always been wondering if its feasible to add a new
> regular expression type to `read' where you don't have to incessantly
> double quote all special characters.  (One could take inspiration from
> Python, for example, which adds an "r" character to strings to turn them
> into regexps: r"regexp".)

I'm all for adding a regexp object type (and a new read syntax), but I
think it's a somewhat orthogonal?  Not totally, though: I've long wished
for match/searching functions to be generic, and work differently on
strings and regexps.  That is, if fed a string, then do comparison with
`string-equal' and when fed a regexp, do the comparison with
`string-match'.

So you could say

(search-forward "foo")

and

(search-forward #r"fo+")

or

(search-forward (re-make "fo+"))

-- no reason for there to be separate functions if we have regexp objects.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



reply via email to

[Prev in Thread] Current Thread [Next in Thread]