bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61514: 30.0.50; sadistically long xml line hangs emacs


From: Eli Zaretskii
Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 14:19:18 +0200

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: "Mark A. Hershberger" <mah@everybody.org>,  61514@debbugs.gnu.org
> Date: Sun, 19 Feb 2023 18:48:43 -0500
> 
> > The problem is in the combination of nxml-mode and some subtle
> > bug/misfeature in our regexp routines.  Specifically, when we overflow
> > the fail stack, we fail to recover in this case, and seem to infloop
> > inside re_match_2_internal, or maybe recover very inefficiently (I
> > waited for almost 1 hour before giving up).  The call which causes the
> > loop is in xmltok.el, in the indicated line:
> >
> > (defun xmltok-scan-attributes ()
> >   (let ((recovering nil)
> >     (atts-needing-normalization nil))
> >     (while (cond ((or (looking-at (xmltok-attribute regexp))
> >                   ;; use non-greedy group
> >                   (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
> >                                             (xmltok-attribute regexp)))
> >                     (unless recovering
> >                       (xmltok-add-error "Malformed attribute"
> >                                         (point)
> >                                         (save-excursion
> >                                           (goto-char (xmltok-attribute start
> >                                                                        
> > name))
> >                                           (skip-chars-backward "\r\n\t ")
> >                                           (point))))
> >                     t))
> >
> > The regexp that causes this is as follows:
> >
> >   
> > "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[
> >  \r\t\n]*=\\(?:[ 
> > \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([
> >  \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
> 
> IIUC the above describes the code where we're stuck inf-looping inside
> `looking-at`?

Not inflooping, but very slowly backtracking, or so it seems.

> Is it the same place where the regexp-stack overflow happens (and with
> the same regexp)?

It's (almost) the same place, but not the same regexp.  The regexp
which causes the stack-overflow message (which is emitted from
set-auto-mode, before entering redisplay) is this:

  
"\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[
 \r\t\n]*=\\(?:[ 
\r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([
 \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
makes all the difference.  So the looking-at which fails reasonably
quickly is the first call to looking-at above, whereas the one the
"hangs" is the second one.  Maybe this points out a way out of this
misery?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]