bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20154: 25.0.50; json-encode-string is too slow for large strings


From: Eli Zaretskii
Subject: bug#20154: 25.0.50; json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 09:58:43 +0200

> Date: Sat, 21 Mar 2015 00:02:51 +0200
> From: Dmitry Gutov <dgutov@yandex.ru>
> CC: 20154@debbugs.gnu.org
> 
> On 03/20/2015 11:14 PM, Eli Zaretskii wrote:
> 
> >> Making the string 10 times longer increases the runtime by ~5 here (0.1
> >> -> 0.5). Another 10x increase in length makes it run 4.3 seconds.
> >
> > So maybe writing this in C is the way to go.
> 
> Maybe implementing `json-encode-string` itself in C isn't strictly 
> necessary, or even particularly advantageous.

It depends on your requirements.  How fast would it need to run to
satisfy your needs?

> How about trying to optimize `replace-match' or 
> `replace-regexp-in-string' (which are the main two approaches we can use 
> to implement `json-encode-string') for the case of large input?
> 
> Take this example:
> 
> (setq s1 (apply #'concat (cl-loop for i from 1 to 30000
>                                    collect "123456789\n"))
>        s2 (apply #'concat (cl-loop for i from 1 to 15000
>                                    collect "1234567890123456789\n")))
> 
> On my machine,
> 
> (replace-regexp-in-string "\n" "z" s1 t t)
> 
> takes ~0.13s, while
> 
> (replace-regexp-in-string "\n" "z" s2 t t)
> 
> clocks at ~0.08-0.10.
> 
> Which is, again, pretty slow by modern standards.

You don't really need regexp replacement functions with all its
features here, do you?  What you need is a way to skip characters that
are "okay", then replace the character that is "not okay" with its
encoded form, then repeat.

So an alternative strategy would be to use 'skip-chars-forward' to
skip to the next locus where encoding is necessary, and 'append' to
construct the output string from the "okay" parts and the encoded "not
okay" part.  This bypasses most of the unneeded complications in
'replace-match' and 'replace-regexp-in-string'.  I don't know if that
will be faster, but I think it's worth trying.  For starters, how fast
can you iterate through the string with 'skip-chars-forward', stopping
at characters that need encoding, without actually encoding them, but
just consing the output string by appending the parts delimited by
places where 'skip-chars-forward' stopped?  That's the lower bound on
performance using this method.

> (And I've only now realized that the above function is implemented in 
> Lisp; all the more reason to move it to C).

I think the latest tendency is the opposite: move to Lisp everything
that doesn't need to be in C.  If some specific application needs more
speed than we can provide, the first thing I'd try is think of a new
primitive by abstracting your use case enough to be more useful than
just for JSON.

Of course, implementing the precise use case in C first is probably a
prerequisite, since it could turn out that the problem is somewhere
else, or that even in C you won't get the speed you want.

> Replacing "z" with #'identity (so now we include a function call 
> overhead) increases the averages to 0.15s and 0.10s respectively.

Sounds like the overhead of the Lisp interpreter is a significant
factor here, no?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]