emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

prior work on non-backtracking regex engine?


From: Danny McClanahan
Subject: prior work on non-backtracking regex engine?
Date: Sun, 10 Mar 2024 15:41:59 +0000

Hello emacs-devel,

I've been learning a lot about regex implementations recently and was 
interested in learning:

(1) What kinds of inputs does the current emacs regex engine do poorly on?
  This "Regexp Problems" page is super helpful 
(https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Problems.html),
 but I wanted to know if there was anywhere else e.g. in the source I should 
look for further notes on current performance TODOs? Is there a benchmark suite?

(2) What feature requirements does emacs impose of the regex engine?
  In particular, someone on the internet claimed (without citing any sources) 
several years ago 
(https://github.com/google/re2/issues/127#issuecomment-267676672) that the 
emacs regex engine needs the ability to search through non-contiguous text data 
to incorporate the gap buffer (which re2 along with many other engines do not 
support). When I recently asked the rust regex maintainer about this idea 
(https://github.com/rust-lang/regex/discussions/1167#discussioncomment-8585027),
 I found that emacs's ability to dynamically rebind word \< and symbol \_< 
boundaries according to mode is also unsupported by many general-purpose 
engines which e.g. only support Unicode word boundaries.

  I am also aware emacs supports backrefs in pattern strings, which cannot be 
implemented without some level of backtracking, but I am hoping this 
functionality can be implemented on top of an otherwise non-backtracking engine 
and retain the performance improvements.

(3) Have there been prior investigations of non-backtracking regex engines in 
emacs, or trying to use an external regex engine in general? What was the 
outcome, and does it seem like a useful research direction?

Thanks,
Danny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]