[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19873: Ill-formed regular expression is constructed in forward-parag

From: Alan Mackenzie
Subject: bug#19873: Ill-formed regular expression is constructed in forward-paragraph.
Date: Thu, 9 Mar 2017 21:04:45 +0000
User-agent: Mutt/1.7.2 (2016-11-26)

Hello, Marcin.

On Sun, Feb 26, 2017 at 17:44:51 +0100, Marcin Borkowski wrote:
> On 2015-02-15, at 10:31, Alan Mackenzie <address@hidden> wrote:

> > Hello, Emacs!

> > In forward-paragraph, L37, a regular expression is constructed as
> > follows:

> > (let* ...
> >  (sp-parstart (concat "^[ \t]*\\(?:" parstart "\\|" parsep "\\)"))
> >  ...)

> > .  Here parstart and parsep are, more or less,
> > paragraph-{start,separate}.

> > The problem is that parstart and parsep themselves are likely to begin
> > with "[ \t]*" (the default values certainly do), so we have two
> > consecutive matchers for an arbitrary amount of whitespace.  This causes
> > the regexp engine to run very slowly when a line starts with lots of WS
> > but doesn't match.

> > This problem seems to be the cause of bug # 19846 (where holding down the
> > spacebar inside a C comment causes Emacs to seize up when auto-fill mode
> > is enabled).

> Hi Alan, hi all,

> I put this bug on my todo-list some time ago and decided now to revisit
> it.

> I'm wondering what could be done about it.  First of all, my Emacs has
> this as paragraph-start:

> "\\|[        ]*$"

> and this as paragraph-separate:

> "[    ]*$"

> and frankly speaking, I'm not sure why they differ at all (by default).
> Also, even though forward-paragraph checks for "^" at their beginning,
> they actually don't begin with that character (again, by default).

> My first thought is to add a check whether paragraph-start and
> paragraph-sep match something like

> "^\\^?\\[[[:space:]]+\\][+*]?"

> and if yes, make parstart/parsep equal to them, but without the matching
> part.


My first reaction is "This is a good idea, but be very careful!".  For
example, if paragraph-start and/or paragraph-separate begin with
"[ \t]+" (i.e. the paragraph start requires space at BOL), you will miss
it by removing matches of "^\\^?\\[[[:space:]]+\\][+*]?" from them.

I think this idea is workable, but you'll have to check for one or both
of paragraph-s{tart,eparate} starting with "[ \t]+".  A good strategy
here might be to begin the target regexp with "^[ \t]*", then begin one
or both components with "[ \t]" (without the "*").

There may be other gotchas which I haven't thought about yet.

One needs a twisted mind to do this sort of thing properly, so I offer my
services to review your upcoming patch.  ;-)

> -- 
> Marcin Borkowski
> http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
> Faculty of Mathematics and Computer Science
> Adam Mickiewicz University

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]