[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Documentation on debugging regexp performance

From: Clément Pit--Claudel
Subject: Re: Documentation on debugging regexp performance
Date: Thu, 21 Jan 2016 11:37:48 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 01/21/2016 10:27 AM, Alan Mackenzie wrote:
> Hello, Clément.

Hi Alan!

> "   +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in
> a non-match, the Emacs regexp engine will try all possible ways of
> matching these spaces before giving up.  You have three concatenated
> sub-expressions, all of which match any number of spaces, namely:
>    " +[^:=]+ +"
>     1122222233
> I would suggest reformulating it thus:
>    " +[^:= ][^:=]+ "
>     112222223333334

I think this has different semantics: my original regexp requires at least 
three spaces. But I think prepending spaces to yours fixes that.

> Subexpression 1 matches ALL the leading spaces.
> Subexp 2 is exactly one
> character which can't be a space.  Subexp 3 matches almost anything,
> including spaces, and subexp 4 matches a single space at the end (to make
> sure there is at least one space there).

This is helpful, thanks! I realize however that maybe I oversimplified. The 
issue is that what I really want is something like this:

"   +\\([^:=]+\\) +:=?"

IOW, I want to capture that first group.

> All the best with your regexp!

Thanks. Your points about backtracking were helpful as well. Do you know if 
there are technical reasons why Emacs chooses a backtracking implementation for 
this regexp (instead of compiling it to a linear-time matcher)?


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]