emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs hangs on long words in .html files


From: Stefan Monnier
Subject: Re: Emacs hangs on long words in .html files
Date: Mon, 04 Dec 2006 02:57:09 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.91 (gnu/linux)

>     -(defconst sgml-namespace-re "[_[:alpha:]][-_.[:alnum:]]*")
>     -(defconst sgml-name-re "[_:[:alpha:]][-_.:[:alnum:]]*")
>     +(defconst sgml-namespace-re "[_[:alpha:]][-_.[:alnum:]]\\{,64\\}")
>     +(defconst sgml-name-re "[_:[:alpha:]][-_.:[:alnum:]]\\{,64\\}")
>      (defconst sgml-tag-name-re (concat "<\\([!/?]?" sgml-name-re "\\)"))
>      (defconst sgml-attrs-re "\\(?:[^\"'/><]\\|\"[^\"]*\"\\|'[^']*'\\)*")
>      (defconst sgml-start-tag-regex (concat "<" sgml-name-re sgml-attrs-re)
>     ------------------------------------------------------------------------

>     Whereas before it was taking 67 seconds to open a file containing a
>     16,000 character 'word', I can now open the same file in about a tenth
>     of a second.

> Can someone please investigate why the search for
> "[_[:alpha:]][-_.[:alnum:]]*" is so slow?
> Is something badly implemented in regex.c?

The regexp search that's causing the problem comes from
sgml-font-lock-keywords-1, more specifically from this entry:

    ;; FIXME: this doesn't cover the variables using a default value.
    (,(concat "\\(?:^\\|[ \t]\\)\\(" sgml-namespace-re "\\)\\(?::\\("
              sgml-name-re "\\)\\)?=[\"']")
     (1 (if (match-end 2) sgml-namespace-face font-lock-variable-name-face))
     (2 font-lock-variable-name-face nil t))
    (,(concat "[&%]" sgml-name-re ";?") . font-lock-variable-name-face)))

the regexp ends up being

  
"\\([_[:alpha:]][-_.[:alnum:]]*\\)\\(?::\\([_:[:alpha:]][-_.:[:alnum:]]*\\)\\)?=[\"']"

Now, regexp.c's handling of this regexp is not 100% optimal.  It misses
a fairly simple optimization.  But this optimization only causes a constant
factor slowdown and isn't that serious.  Indeed a quick patch to implement
the needed optimization shows that it doesn't fix the OP's problem.

The OP's problem is that the above regexp is not anchored.  So when faced
with an N-char long word that is not followed by "=", Emacs will first take
O(N) steps to find out that the match fails at the beginning of the word,
then it will take O(N-1) steps to find out that the match also fails if
starting after the first char, then O(N-2) steps to figure out it also fails
if starting after the 2nd char, ...

All in all I suggest the patch below, which does seem to remove the
O(N^2) behavior.


        Stefan


--- orig/lisp/textmodes/sgml-mode.el
+++ mod/lisp/textmodes/sgml-mode.el
@@ -265,7 +265,7 @@
      (1 (if (match-end 2) sgml-namespace-face font-lock-function-name-face))
      (2 font-lock-function-name-face nil t))
     ;; FIXME: this doesn't cover the variables using a default value.
-    (,(concat "\\(" sgml-namespace-re "\\)\\(?::\\("
+    (,(concat "\\(?:^\\|[ \t]\\)\\(" sgml-namespace-re "\\)\\(?::\\("
              sgml-name-re "\\)\\)?=[\"']")
      (1 (if (match-end 2) sgml-namespace-face font-lock-variable-name-face))
      (2 font-lock-variable-name-face nil t))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]