[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
prior work on non-backtracking regex engine?
From: |
Danny McClanahan |
Subject: |
prior work on non-backtracking regex engine? |
Date: |
Sun, 10 Mar 2024 15:41:59 +0000 |
Hello emacs-devel,
I've been learning a lot about regex implementations recently and was
interested in learning:
(1) What kinds of inputs does the current emacs regex engine do poorly on?
This "Regexp Problems" page is super helpful
(https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Problems.html),
but I wanted to know if there was anywhere else e.g. in the source I should
look for further notes on current performance TODOs? Is there a benchmark suite?
(2) What feature requirements does emacs impose of the regex engine?
In particular, someone on the internet claimed (without citing any sources)
several years ago
(https://github.com/google/re2/issues/127#issuecomment-267676672) that the
emacs regex engine needs the ability to search through non-contiguous text data
to incorporate the gap buffer (which re2 along with many other engines do not
support). When I recently asked the rust regex maintainer about this idea
(https://github.com/rust-lang/regex/discussions/1167#discussioncomment-8585027),
I found that emacs's ability to dynamically rebind word \< and symbol \_<
boundaries according to mode is also unsupported by many general-purpose
engines which e.g. only support Unicode word boundaries.
I am also aware emacs supports backrefs in pattern strings, which cannot be
implemented without some level of backtracking, but I am hoping this
functionality can be implemented on top of an otherwise non-backtracking engine
and retain the performance improvements.
(3) Have there been prior investigations of non-backtracking regex engines in
emacs, or trying to use an external regex engine in general? What was the
outcome, and does it seem like a useful research direction?
Thanks,
Danny
- prior work on non-backtracking regex engine?,
Danny McClanahan <=