[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (error "Stack overflow in regexp matcher") and (?)wrong display of r

From: Alan Mackenzie
Subject: Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace
Date: Sun, 15 Mar 2020 16:57:15 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

Hello, Mattias.

On Sun, Mar 15, 2020 at 13:22:20 +0100, Mattias Engdegård wrote:
> 15 mars 2020 kl. 11.39 skrev Alan Mackenzie <address@hidden>:

> Hello Alan. Thanks for the nice example!

> > First of all, note the regexp, "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*"

> > In the source, the "\15" is "\r".  Why is this substitution being made
> > for the backtrace?  Is it intentional (in which case, why not do the
> > same to the "\n"?), or is it a bug?  To me, it is more like a bug.

> I agree; there are some ad-hoc switches like print-escape-newlines
> (which only works on \n and \f) and print-escape-control-characters
> (which produces octal), but nothing that gives human-friendly escapes
> for other known control characters.


> > More importantly, why is there a stack overflow here at all?  Even
> > though the regexp matcher has a long, long piece of buffer to scan over,
> > the regexp is a simple linear search, without any nesting to speak of.

> Let's ask xr for help:

> (xr-pp "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*")
> =>
> (zero-or-more
>  (group
>   (or (seq "\\"
>            (group anything))
>       (not (any "\n\r\\")))))

> (note that xr pretty-prints \r properly)

> There are two capture groups here, neither of which are actually used.
> Remove them (the outer one in particular) and the regexp no longer
> overflows.

I agree (having tried "\\(?:" in place of "\\("), but why?  What is
causing the recursion here?  Each of the two groups need only remember
the latest string matching it.  Surely?  I'd like some insight into
this, so as to avoid it happening again.

[ .... ]

I actually changed the regexp to one which searches for what I'm looking
for (a non-escaped newline or EOB) from the regexp here (which matches
everything which I'm not looking for).  I might even time the two
approaches and see which is faster.

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]