bug#16830: [Bug] 24.3.50; massive slow down in forward-line

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16830: [Bug] 24.3.50; massive slow down in forward-line

From:	Eli Zaretskii
Subject:	bug#16830: [Bug] 24.3.50; massive slow down in forward-line
Date:	Mon, 10 Mar 2014 20:58:22 +0200

> Date: Sat, 22 Feb 2014 13:27:47 +0100
> From: "Stefan-W. Hahn" <stefan.hahn@s-hahn.de>
> Cc: 16830@debbugs.gnu.org
> 
> my original org-mode file:
> emacs 24.3.50.1    2.065 sec (cache-long-scans nil, set local)
> emacs 24.3.1       0.722 sec (cache-long-line-scan nil; as it was)
> 
> test-neuter.org:
> emacs 24.3.50.1    0.925 sec (cache-long-scans nil, set local)
> emacs 24.3.1       0.381 sec (cache-long-line-scan nil; as it was)

So I think it's a very Good Thing that we have this bug report,
because looking into this issue produced a surprising (for me)
discovery: turning off the newline cache makes forward-line
significantly (5 to 20 times, depending on the file) faster than when
the cache is turned on, for files with relatively short lines.

It looks like the GCC implementation of memchr, used by the "dumb
loop" in find_newline, is so efficient that it easily outperforms the
"smart loop" which uses the cache, unless lines are very long, like at
least 400 characters, at which point the code performs with and
without the cache at the same speed.

This striking difference in speed goes mostly unnoticed, because
typically Emacs seldom if ever calls forward-line too much (see below
for how much should "too much" for this to become visible).  However,
typing "C-c C-c" in the OP's Org file does just that -- it causes
forward-line be invoked a huge number of times, because it walks the
(29K line) Org file one line at a time with this function:

  (defsubst org-goto-line (N)
    (save-restriction
      (widen)
      (goto-char (point-min))
      (forward-line (1- N))))

IOW, to move from line N to line N+1, this goes back to the beginning,
and then traverses all the N lines again, plus one more line.  Not a
very efficient way, to put it mildly.  So I suggest that Org
developers look into making this use case more efficient, no matter
whether the changes suggested below are or aren't installed.

Inspired by the above function, I profiled find_newline, which is the
workhorse of forward-line, using xdisp.c as the test file and this
silly program:

      (let ((n 1))
       (while (not (eobp))
        (goto-char 1) 
        (forward-line n) 
        (setq n (1+ n))))

Running this program on xdisp.c calls find_newline 30K times and
exercises its inner loop, which looks for the next newline, 450
million times.  On my machine and in an optimized build, this takes
about 1 min 14 sec with the cache turned on, and only 12 sec with it
turned off, a factor of 6.

By careful optimization of find_newline, I have succeeded to slash the
time of the above loop by a factor of 2, see the proposed patch below.
The "C-c C-c" command in the OP's Org file runs 3 times faster with
those changes, and takes only 2 sec instead of 6.5.  This still leaves
the no-cache operation faster by a factor of about 3, though, in both
these test cases.

Again, this is for files whose average line length is 30 to 50
characters.  For files whose lines are at least 10 times longer,
the times with and without cache become almost identical, and for
longer lines the cache starts to win.

It would be nice to be able to turn the cache on and off dynamically,
depending on the actual line length of the buffer.  I tried to
implement this, but my naive implementation didn't work well, because
sampling of the lines tends to be extremely un-representative.  If
someone can come up with a smarter implementation, please show it.

Until we can dynamically estimate the line length and turn the cache
on only for long lines, I suggest to leave the default ON, and install
the patches below.  My reasoning is that in most situations the
slow-down is negligible, while for very long lines the speedup can be
significant.

For the record, here are the measurements I made, before and after the
changes, with 2 test cases: xdisp.c scanning with the above program,
and the "neutered" Org file posted by Stefan-W. Hahn:

   Test     Code   Optimized?    Cache ON    Cache OFF
   ---------------------------------------------------
   Org      old    NO              11.3s      0.8s
   Org      new    NO               3.6s      0.8s
   Org      old    YES              6.5s      0.3s
   Org      new    YES              2.0s      0.25s
   xdisp.c  old    NO            2m11.8s     14.7s
   xdisp.c  new    NO            1m03.3s     14.8s
   xdisp.c  old    YES           1m14.4s     12.0s
   xdisp.c  new    YES             32.5s     11.8s

And here are the patches I propose.  (Note that I only handled the
forward scan; the backward scan is used much less, so I left it alone,
but if someone thinks the asymmetry might be confusing, I can do the
same surgery with backward scan.)

Any objections to committing this?

--- src/search.c.~2~    2014-01-02 07:07:04.000000000 +0200
+++ src/search.c        2014-03-10 19:40:08.607562800 +0200
@@ -715,18 +715,61 @@ find_newline (ptrdiff_t start, ptrdiff_t
            examine.  */
        ptrdiff_t tem, ceiling_byte = end_byte - 1;
 
-        /* If we're looking for a newline, consult the newline cache
-           to see where we can avoid some scanning.  */
+        /* If we're using the newline cache, consult it to see whether
+           we can avoid some scanning.  */
         if (newline_cache)
           {
             ptrdiff_t next_change;
+           int result = 1;
+
             immediate_quit = 0;
-            while (region_cache_forward
-                   (cache_buffer, newline_cache, start, &next_change))
-              start = next_change;
-            immediate_quit = allow_quit;
+            while (start < end && result)
+             {
+               ptrdiff_t lim1;
 
-           start_byte = CHAR_TO_BYTE (start);
+               result = region_cache_forward (cache_buffer, newline_cache,
+                                              start, &next_change);
+               if (result)
+                 {
+                   start = next_change;
+                   lim1 = next_change = end;
+                 }
+               else
+                 lim1 = min (next_change, end);
+
+               /* The cache returned zero for this region; see if
+                  this is because the region is known and includes
+                  only newlines.  While at that, count any newlines
+                  we bump into, and exit if we found enough off them.  */
+               start_byte = CHAR_TO_BYTE (start);
+               while (start < lim1
+                      && FETCH_BYTE (start_byte) == '\n')
+                 {
+                   start_byte++;
+                   start++;
+                   if (--count == 0)
+                     {
+                       if (bytepos)
+                         *bytepos = start_byte;
+                       return start;
+                     }
+                 }
+               /* If we found a non-newline character before hitting
+                  position where the cache will again return non-zero
+                  (i.e. no newlines beyond that position), it means
+                  this region is not yet known to the cache, and we
+                  must resort to the "dumb loop" method.  */
+               if (start < next_change && !result)
+                 break;
+               result = 1;
+             }
+           if (start >= end)
+             {
+               start = end;
+               start_byte = end_byte;
+               break;
+             }
+            immediate_quit = allow_quit;
 
             /* START should never be after END.  */
             if (start_byte > ceiling_byte)
@@ -762,9 +805,9 @@ find_newline (ptrdiff_t start, ptrdiff_t
              unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
              next = nl ? nl - lim_addr : 0;
 
-              /* If we're looking for newlines, cache the fact that
-                 this line's region is free of them. */
-              if (newline_cache)
+              /* If we're using the newline cache, cache the fact that
+                 the region we just traversed is free of newlines. */
+              if (newline_cache && cursor != next)
                {
                  know_region_cache (cache_buffer, newline_cache,
                                     BYTE_TO_CHAR (lim_byte + cursor),
@@ -840,7 +883,7 @@ find_newline (ptrdiff_t start, ptrdiff_t
 
               /* If we're looking for newlines, cache the fact that
                  this line's region is free of them. */
-              if (newline_cache)
+              if (newline_cache && cursor != prev + 1)
                {
                  know_region_cache (cache_buffer, newline_cache,
                                     BYTE_TO_CHAR (ceiling_byte + prev + 1),

[Prev in Thread]

Current Thread

[Next in Thread]

bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Eli Zaretskii <=
- bug#16830: [Bug] 24.3.50; massive slow down in forward-line, martin rudalics, 2014/03/11
  - bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Eli Zaretskii, 2014/03/11
  - bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Stefan Monnier, 2014/03/12
    - bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Eli Zaretskii, 2014/03/16
- bug#16830: Re: bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Dmitry Antipov, 2014/03/17
  - bug#16830: [Bug] 24.3.50; massive slow down in forward-line, Eli Zaretskii, 2014/03/17

Prev by Date: bug#16971: 24.3.50; comment-beginning behaves differently in current newcomment.el, leads to error in comment-indent-new-line
Next by Date: bug#16967: frame related race condition
Previous by thread: bug#16980: eieio-generic-call shouldn't load libraries
Next by thread: bug#16830: [Bug] 24.3.50; massive slow down in forward-line
Index(es):
- Date
- Thread