[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Documentation on debugging regexp performance
From: |
Clément Pit--Claudel |
Subject: |
Re: Documentation on debugging regexp performance |
Date: |
Thu, 21 Jan 2016 11:37:48 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
On 01/21/2016 10:27 AM, Alan Mackenzie wrote:
> Hello, Clément.
Hi Alan!
> " +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in
> a non-match, the Emacs regexp engine will try all possible ways of
> matching these spaces before giving up. You have three concatenated
> sub-expressions, all of which match any number of spaces, namely:
>
> " +[^:=]+ +"
> 1122222233
>
> I would suggest reformulating it thus:
>
> " +[^:= ][^:=]+ "
> 112222223333334
I think this has different semantics: my original regexp requires at least
three spaces. But I think prepending spaces to yours fixes that.
>
> Subexpression 1 matches ALL the leading spaces.
> Subexp 2 is exactly one
> character which can't be a space. Subexp 3 matches almost anything,
> including spaces, and subexp 4 matches a single space at the end (to make
> sure there is at least one space there).
This is helpful, thanks! I realize however that maybe I oversimplified. The
issue is that what I really want is something like this:
" +\\([^:=]+\\) +:=?"
IOW, I want to capture that first group.
> All the best with your regexp!
Thanks. Your points about backtracking were helpful as well. Do you know if
there are technical reasons why Emacs chooses a backtracking implementation for
this regexp (instead of compiling it to a linear-time matcher)?
Clément.
signature.asc
Description: OpenPGP digital signature