[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el

From: Dmitry Gutov
Subject: bug#31138: Native json slower than json.el
Date: Tue, 23 Apr 2019 14:39:50 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 23.04.2019 13:22, Eli Zaretskii wrote:

Yes, but I'm slightly surprised why you loop from the end of the
string and not from the beginning.

To avoid creating an additional pointer variable.

I don't think it matters, and looping forward is more natural and may
even be slightly faster.

OK. It was mostly a matter of taste for me anyway. (I would be interested in any examples of "slightly faster", though).

I guess that's expected when the strings in JSON are short enough.

Longer strings take a proportional amount of time to encode, though
(only 2x as fast per character, IIRC).

I was talking about decoding.  Assuming that decode_coding_utf_8 has
some setup overhead before it starts the loop of processing the bytes,
that overhead will become less significant with longer strings.  And
indeed, if I make the strings in large.json be 10K characters (can
this happen in real-life JSONs?),

Everything can happen, but I'm not aware of a particular application.

the speedup from using
make_specified_string for valid UTF-8 input goes down to just 40% for
unoptimized builds and 20% for optimized (see the timing data below).
But it's still faster even for such large strings, so I installed a
variant of what we were discussing.

Thank you.

And for small strings, your numbers seem even more encouraging than mine.

Comparing with json.el shows that we've got 8-fold to ten-fold speedup
in optimized builds.

Here are my timings for the various variants ("large" means with JSON
input where all strings were enlarged to 10K characters):

   variant                       | unoptimized | optimized
   curent master                 |    3.563    |   0.664
   curent master, large          |  174.0      |  43.34
   no validation                 |    0.980    |   0.326
   no validation, large          |  105.1      |  33.13
   coding_system directly        |    2.962    |   0.660
   coding_system directly, large |  173.4      |  43.19
   UTF-8 validation              |    0.980    |   0.334
   UTF-8 validation, large       |  105.9      |  34.36

0.334 vs 0.644, I like that. :-)

In all cases, the times are from 10 benchmark loops, after subtracting
the time used by GC.

I figured we might be saving a bit on GC pauses as well (doing and allocating less stuff), but they are harder to time, of course.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]