[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25706: 26.0.50; Slow C file fontification
From: |
Alan Mackenzie |
Subject: |
bug#25706: 26.0.50; Slow C file fontification |
Date: |
Thu, 10 Dec 2020 12:26:48 +0000 |
Hello, Mattias.
Thanks for this!
On Wed, Dec 09, 2020 at 18:00:30 +0100, Mattias EngdegÄrd wrote:
> First, some Emacs regexp basics:
> 1. If A and B match single characters, then A\|B should be written
> [AB] whenever possible. The reason is that A\|B adds a backtrack
> record which uses stack space and wastes time if matching fails later
> on. The cost can be quite noticeable, which we have seen.
> 2. Syntax-class constructs are usually better written as character
> alternatives when possible.
> The \sX construct, for some X, is typically somewhat slower to match
> than explicitly listing the characters to match. For example, if all
> you care about are space and tab, then "\\s *" should be written "[
> \t]*".
> 3. Unicode character classes are slower to match than ASCII-only ones.
> For example, [[:alpha:]] is slower than [A-Za-z], assuming only those
> characters are of interest.
> 4. [^...] will match \n unless included in the set. For example,
> "[^a]\\|$" will almost never match the $ (end-of-line) branch, because
> a newline will be matched by the first branch. The only exception is
> at the very end of the buffer if it is not newline-terminated, but
> that is rarely worth considering for source code.
> 5. \r (carriage return) normally doesn't appear in buffers even if the
> file uses DOS line endings. Line endings are converted into a single
> \n (newline) when the buffer is read. In particular, $ does NOT match
> at \r, only before \n.
> When \r appears it is usually because the file contains a mixture of
> line-ending styles, typically from being edited using broken tools.
> Whether you want to take such files into account is a matter of
> judgement; most modes don't bother.
> 6. Capturing groups costs more than non-capturing groups, but you
> already know that.
> On to specifics: here are annotations for possible improvements in
> cc-langs.el. (I didn't bother about capturing groups here.)
I think we should get around to fixing the regexps in CC Mode soon. But
I think I would rather do this as a separate exercise, since the patch
for this bug is already around 800 lines and Ravine Var, the OP, has
found further problems on a slowish machine.
In particular, some of the fixes in your patch relate to the CPP
constructs, and they might well be slowing down that regexp in
c-find-decl-spots I highlighted earlier. So I'm keen to look at this
again, once the current bug is settled.
--
Alan Mackenzie (Nuremberg, Germany).
- bug#25706: 26.0.50; Slow C file fontification, (continued)
- bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/09
- bug#25706: 26.0.50; Slow C file fontification, Ravine Var, 2020/12/10
- bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/10
- bug#25706: 26.0.50; Slow C file fontification, Ravine Var, 2020/12/11
- bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/12
- bug#25706: 26.0.50; Slow C file fontification, Ravine Var, 2020/12/14
- bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/14
- bug#25706: 26.0.50; Slow C file fontification, Ravine Var, 2020/12/14
- bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/15
- bug#25706: 26.0.50; Slow C file fontification, Mattias EngdegÄrd, 2020/12/09
- bug#25706: 26.0.50; Slow C file fontification,
Alan Mackenzie <=
bug#25706: 26.0.50; Slow C file fontification, Alan Mackenzie, 2020/12/01