[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el

From: Dmitry Gutov
Subject: bug#31138: Native json slower than json.el
Date: Mon, 22 Apr 2019 18:02:35 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 22.04.2019 16:02, Eli Zaretskii wrote:

Thank you. Tried it, tests now pass, and the performance improvement is
the same. I compared the same benchmark (100 iterations, GC disabled for
the whole duration), and this patch takes

src/emacs -Q --batch -l ~/examples/elisp/json-test.el
Elapsed time: 51.153870s

down to

$ src/emacs -Q --batch -l ~/examples/elisp/json-test.el
Elapsed time: 26.268435s

Are you still against it? (Just checking).

I'm still against using that patch as is, yes.  I'm okay with using
make_specified_string if before calling it we make sure the string is
plain ASCII or a series of proper UTF-8 sequences.  Not sure how much
of a performance hit would such tests cost us, but if you are
interested, let's time them.


(Let me know if you need help in writing the code for the above 2
tests.  I think parse_str_as_multibyte should help a lot.)

I do.

At the very least: am I supposed to use parse_str_as_multibyte similarly to how make_string does, or to write a function similar to parse_str_as_multibyte? I can more or less follow its logic, but I don't understand if any of its callees cannot cope with improper input.

I guess we should also have some test case with non-ASCII characters,
if we will introduce these optimizations.

We already do in test/src/json-tests.el, like I previously mentioned. And the simple patch (which you're against) passes them. I've put the patch at the end of this email so we're on the same page.


P.S.  I'm still not sure these optimizations will make the OP happy,
since at some point I heard them saying that our present performance
is abysmally slow and inadequate.

Well... IIUC Node.js's JSON parsing is ~10 times as fast. Ruby's parser speeds vary from ~9 times as fast to ~3 as fast, for comparison.

For LSP usage, we are of course comparing with Node. But since we're still here, and lsp-mode has some users, reaching Node's performance level is likely not a life-or-death situation.

If that wasn't a wild exaggeration,
then halving the time will still be inadequate.  So maybe we should
agree in advance whether 30% to 50% improvement will be "good enough",
before we embark on this adventure.

If we're talking about big changes and increases in complexity, sure, we should weigh them. But if a simple change gives us even a 20-30% improvement, why not take it? The reporter is not the only one who parses JSON in Emacs.

Speaking of bigger improvements... it seems that with the patch below, and the fact that it passes the existing tests, we have at least established that the contents of the C strings that libjansson returns and our "decoded" strings are very often exactly the same. So most of the time what code_convert_string does is not really conversion, but in effect verification. I'm betting it's a frequent situation in other use cases, too.

So one optimization (more complex to implement, I'm sure) would be to defer creating coding->dst_object inside decode_coding_object until we're sure we need it (the source and destination bytes actually come out different), and if we don't, return src_object in the end (I'm only taking about the case when dst_object is Qt). That might improve performance across the board, including during the encoding step. Or might not, of course. What do you think?

diff --git a/src/json.c b/src/json.c
index 928825e034..2b0cc8a313 100644
--- a/src/json.c
+++ b/src/json.c
@@ -225,8 +225,7 @@ json_has_suffix (const char *string, const char *suffix)
 static Lisp_Object
 json_make_string (const char *data, ptrdiff_t size)
- return code_convert_string (make_specified_string (data, -1, size, false),
-                              Qutf_8_unix, Qt, false, true, true);
+  return make_specified_string (data, -1, size, false);

 /* Create a multibyte Lisp string from the NUL-terminated UTF-8

reply via email to

[Prev in Thread] Current Thread [Next in Thread]