bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61514: 30.0.50; sadistically long xml line hangs emacs


From: Eli Zaretskii
Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Sat, 18 Feb 2023 18:22:58 +0200

> Date: Tue, 14 Feb 2023 16:02:04 -0500
> From:  "Mark A. Hershberger" via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> 
> There seems to be a regression between 28 and 30 with how emacs handles
> long lines.

No, there's no regression with long lines.  There's an existing bug in
our regexp routines and/or nxml.  See below.

> Bottom line: Emacs 30 is handling files with long lines worse than Emacs
> 28.

This conclusion is incorrect, or at least inaccurate.  Emacs 28.2 has
the same problem as Emacs 30.  Take that a.xml file, truncate it after
250000 characters, then visit it with Emacs 28.2 -- you will see that
Emacs 28.2 freezes exactly like Emacs 30 does.

The problem is in the combination of nxml-mode and some subtle
bug/misfeature in our regexp routines.  Specifically, when we overflow
the fail stack, we fail to recover in this case, and seem to infloop
inside re_match_2_internal, or maybe recover very inefficiently (I
waited for almost 1 hour before giving up).  The call which causes the
loop is in xmltok.el, in the indicated line:

(defun xmltok-scan-attributes ()
  (let ((recovering nil)
        (atts-needing-normalization nil))
    (while (cond ((or (looking-at (xmltok-attribute regexp))
                      ;; use non-greedy group
                      (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
                                                (xmltok-attribute regexp)))
                        (unless recovering
                          (xmltok-add-error "Malformed attribute"
                                            (point)
                                            (save-excursion
                                              (goto-char (xmltok-attribute start
                                                                           
name))
                                              (skip-chars-backward "\r\n\t ")
                                              (point))))
                        t))

The regexp that causes this is as follows:

  
"[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[
 \r\t\n]*=\\(?:[ 
\r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([
 \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"





reply via email to

[Prev in Thread] Current Thread [Next in Thread]