[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (error "Stack overflow in regexp matcher") and (?)wrong display of r

From: Mattias Engdegård
Subject: Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace
Date: Sun, 15 Mar 2020 21:06:46 +0100

15 mars 2020 kl. 17.57 skrev Alan Mackenzie <address@hidden>:

> I agree (having tried "\\(?:" in place of "\\("), but why?  What is
> causing the recursion here?  Each of the two groups need only remember
> the latest string matching it.  Surely?  I'd like some insight into
> this, so as to avoid it happening again.

Each time a capture group is entered, the old extent of that group is pushed 
onto the stack so that it can be reloaded if a match failure occurs and the 
engine needs to backtrack. This means that removing the unnecessary group is 
not an asymptotic improvement in space here, but reduces the constant factor so 
that you can match larger strings. It's still O(N) space, N being the string 

You could probably rewrite the regexp to improve the constant further by taking 
advantage of the fact that the [^\\\n\r] branch matches much more often than 
the other. Something like

(rx (* (seq "\\" anything))
    (* (+ (not (any "\n\r\\")))
       (seq "\\" anything))
    (* (not (any "\n\r\\"))))

which could perhaps be trimmed a bit. In any case, removing unnecessary capture 
groups everywhere is a cheap and easy improvement to start with. (Another 
benefit of rx, by the way -- there tends to be a lot fewer of those.)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]