[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#13802: stack overflow in mm-add-meta-html-tag
From: |
Stefan Monnier |
Subject: |
bug#13802: stack overflow in mm-add-meta-html-tag |
Date: |
Sun, 24 Feb 2013 21:04:21 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) |
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.
> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
I don't think that would help. To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.
> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.
In reality, such overflow should only ever happen if you have backrefs
in your regexp.
Stefan