bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58558: 29.0.50; re-search-forward is slow in some buffers


From: Stefan Monnier
Subject: bug#58558: 29.0.50; re-search-forward is slow in some buffers
Date: Tue, 13 Dec 2022 12:38:13 -0500
User-agent: Gnus/5.13 (Gnus v5.13)

>>> I will look how to do it. Maybe perf probe.
>>> I guess, it will be useful to compile Emacs with debug symbols at this
>>> point.
>>
>> AFAIR, you can ask perf to profile a single function, and you can ask
>> it to annotate the profile with the source code.
>
> I now compiled Emacs with debug symbols, waited enough to see observable
> increase in the benchmark-run timing, and recorded the perf data.
>
> buf_bytepos_to_charpos is still on the top
>
>     78.06%  emacs         emacs                       [.] 
> buf_bytepos_to_charpos
>      3.00%  emacs         emacs                       [.] re_match_2_internal
>      1.05%  emacs         emacs                       [.] find_interval
>      1.04%  emacs         emacs                       [.] CHAR_TABLE_REF_ASCII
>      0.85%  emacs         emacs                       [.] make_lisp_symbol
>      0.80%  emacs         emacs                       [.] re_search_2
>      0.76%  emacs         emacs                       [.] builtin_lisp_symbol
>      0.62%  emacs         emacs                       [.] PSEUDOVECTORP

AFAIK the main places where we call `buf_bytepos_to_charpos` from
`re_match_2_internal` is via the `SYNTAX_TABLE_BYTE_TO_CHAR` macro, used
for regexp elements that depend on syntax tables (i.e. \<, \>, \_<, ...).

But I'd expect those to be executed "frequently&closely" enough that the
`cached_(byte|char)pos` data should almost always be nearby, making the
call to `buf_bytepos_to_charpos` fairly cheap (more specifically
the `for (tail = BUF_MARKERS (b);...` loop should not iterate many
times, regardless how many markers there are).

> My guess: number of markers is growing somehow?

`buf_bytepos_to_charpos` itself creates markers (using them as a cache
of previous conversions), so that might be why.

But we only look at the first N markers where N*50 is the distance to
the closest marker found so far.  So growth is not sufficient (it's
clearly a part of the reason, tho).

Regarding growth: could you call `garbage-collect` between the calls to
`re-search-forward` to see if that avoids the accumulation?
[ I presume here that those markers are created/added by
  `buf_bytepos_to_charpos` itself, so they should be recovered by the GC
  because they're not referenced from anywhere else.  ]

I'd be interested to know how many iterations of the `for (tail =
BUF_MARKERS (b);...` loop are executed on average during your
`re-search-forward` (and how that average changes between runs of
`re-search-forward`).


        Stefan


PS: Of course, another approach would be to replace this code with
something else.  Using markers as a cache of bytepos/charpos conversions
has been a source of a few performance issues over the year.

Another approach could be to use a "vector with gap" managed alongside
the actual buffer text.  It could be indexed by "charpos divided by
1024", so conversion from charpos to bytepos could be a simple vector
lookup followed by scanning at most 1kB, and conversion in the other
direction would use a binary search in that same vector (or we could use
2 "vectors with gap", one per direction of conversion).






reply via email to

[Prev in Thread] Current Thread [Next in Thread]