[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Make regexp handling more regular
From: |
Lars Ingebrigtsen |
Subject: |
Re: Make regexp handling more regular |
Date: |
Thu, 03 Dec 2020 09:31:56 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
Stefan Kangas <stefankangas@gmail.com> writes:
> I like the idea of adding an entirely new built-in API based on the
> current state of the art. I would begin such a project by looking into
> what other Lisps are doing, such as CL, Clojure, Guile and Racket. Why
> shouldn't Emacs Lisp be best-in-class?
Sure.
Common Lisp doesn't have regexps, but (some) implementations do, and
there's a bunch of libraries, like http://edicl.github.io/cl-ppcre/
I'm not much in favour:
* (scan "(a)*b" "xaaabd")
1
5
#(3)
#(4)
* (let ((s (create-scanner "(([a-c])+)x")))
(scan s "abcxy"))
0
4
#(0 2)
#(3 3)
And since it's Common Lisp, of course you have special forms for
destructing:
* (register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
(list first second third fourth))
("c" "a" "b" "c")
Guile: https://www.gnu.org/software/guile/manual/html_node/Regexp-Functions.html
(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
⇒ #("blah2002" (4 . 8))
(map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
⇒ ("abc" "def")
Clojure: https://purelyfunctional.tv/mini-guide/regexes-in-clojure/
(re-matches #"abc(.*)" "abcxyz")
["abcxyz" "xyz"]
I.e., if there's one match, we return the match substring, otherwise an
array. It's nice in one way, but the cleverness leads to errors when
(re-)writing code.
(subs (re-matches #"[a-z]+" "fooo baar") 3)
but then you add some more and you have to rewrite to something like:
(let [[_ s1 s2] (re-matches #"([a-z]+) ([a-z]+)" full-name)]
(subs s1 3))
I hate that.
The thing that makes looking at other languages here slightly less
useful is that Emacs has buffers. We're often not interested in the
(sub-)matches themselves at all, but instead their buffer positions
(i.e., match-beginning/end).
> As for naming, how about just using a short prefix such as "re-"?
> AFAICT, we currently have only five functions using that prefix.
Sure.
> Tangentially, I have always been wondering if its feasible to add a new
> regular expression type to `read' where you don't have to incessantly
> double quote all special characters. (One could take inspiration from
> Python, for example, which adds an "r" character to strings to turn them
> into regexps: r"regexp".)
I'm all for adding a regexp object type (and a new read syntax), but I
think it's a somewhat orthogonal? Not totally, though: I've long wished
for match/searching functions to be generic, and work differently on
strings and regexps. That is, if fed a string, then do comparison with
`string-equal' and when fed a regexp, do the comparison with
`string-match'.
So you could say
(search-forward "foo")
and
(search-forward #r"fo+")
or
(search-forward (re-make "fo+"))
-- no reason for there to be separate functions if we have regexp objects.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Re: Make regexp handling more regular, Daniel Martín, 2020/12/02
Re: Make regexp handling more regular, Juri Linkov, 2020/12/02