bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56682: Fix the long lines font locking related slowdowns


From: Dmitry Gutov
Subject: bug#56682: Fix the long lines font locking related slowdowns
Date: Sun, 14 Aug 2022 20:47:40 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 14.08.2022 16:15, Eli Zaretskii wrote:

There's no point in doing that. Either we narrow to some area around
point (might even using a larger radius like 1 MB), or we we only
fontify up to some position. The former easily creates bad fontification.

The alternative, of course, is to pay the price of syntax-ppss on larger
spans and wait the corresponding amount of time the first time the user
scrolls to EOB. That's what the current default on the branch does.

You are still thinking in terms of the original design of syntactical
analysis which strives to produce 100% accurate results.  That design
principle doesn't work with very long lines, so sticking to it would
indeed lead us to give up on solving the problem.

s/very long lines/very large files

In any case, the "original design" is not going anywhere (as the only way to achieve correctness), and I'm talking in terms of balance between accuracy and performance. To use Gregory's narrowing approach in font-lock, checkout the branch under discussion (scratch/font_lock_large_files) and evaluate

  (setq font-lock-large-files '(narrow . 5000))

You'll see the same behavior as on master now (except narrowing isn't "hard"), with the same performance characteristics.

The better way is to acknowledge that some inaccuracies are acceptable
in those cases.  With that in mind, one can design a syntax analyzer
that looks back only a short ways, until it finds some place that
could reasonably serve as an anchor point for heuristic decisions
about whether we are inside or outside a string or comment, and then
verifying that guess with some telltale syntactic elements that follow
(like semi-colons or comment-end delimiters in C).  While this kind of
heuristics can sometimes fail, if they only fail rarely, the result is
a huge win.

You cannot design a language-agnostic syntax analyzer like that. It's something every major mode would have to consider how to implement.

It's relatively easy to design for JSON (again) because the syntax is so simple, but for others -- not so much.

So we need to settle on the basic design first. The code on the branch includes the narrowing approach which is trivially extended to use the "find safe place" hook when it's available. But it won't be always available.

But as Gregory shows, when you get to _really_ large files (like 1 GB
JSON file in his example), pressing M-> will still make you wait (I have
to wait around 20 seconds).

Try with the latest master, it might have improved (fingers crossed).

All improvements are welcome, but that's unlikely:

In any case, the way to speed up these cases is to look at the profile
and identify the code that is slowing us down; then attempt to make it
faster.  (20 sec is actually long enough for us to interrupt Emacs
under a debugger and look at the backtrace to find the culprit.)

I've profiled and benchmarked this scenario already: all of the delay (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.

So the "don't fontify past X" strategy is simply based on the idea
that no fontification is probably better than unreliable and
obviously incorrect one.

I disagree with that idea, but if someone agrees with you, they can
simply turn off font-lock.  As was already mentioned many times in
this endless discussion.

If someone agrees with me, they will simply be able to customize
font-lock-large-files to choose this strategy.

If that solves the problems in a reasonable way for very long lines,
maybe we will eventually have such an option.

Can I merge the branch, then?

I was hoping for a stylistic review, perhaps. Like, whether you like the name of the variable, and should it be split in two.

A change of the default value(s) is on the table too.

I'm still waiting for people to come forward with other major modes
which have the same kind of problems. Preferably ones that are likely to
be used with large files.

One such major mode and one such file was presented long ago : a
single-line XML file.

XMl is indeed slower. It takes almost 3 seconds for me to scroll to the end of a 20 MB XML file.

Most of it comes from sgml--syntax-propertize-ppss, which is probably justified: XML is a more complex language.

But other than the initial delay, scrolling, and isearch, and local editing, all work fast, unlike the original situation with JSON.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]