bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el


From: Sébastien Chapuis
Subject: bug#31138: Native json slower than json.el
Date: Sat, 23 Mar 2019 09:59:23 +0800

Hello,

I tried to find the cause of this but still without any success.
Here is a reproducible case:

You can download the json file at:
https://gist.githubusercontent.com/yyoncho/dec968b69185305ed02741e18b27a82d/raw/334b0a51bc52cc3c98edb8ff4bccb5fc4531842b/large.json

Open the file with `emacs -Q large.json`.
Switch to the scratch buffer and run:

```
(with-current-buffer  "large.json"
  (benchmark-run 10 (json-parse-string (buffer-string))))
;;; (2.5371836119999998 10 0.111044641)

(with-current-buffer  "large.json"
  (let ((str (buffer-string)))
    (benchmark-run 10 (with-temp-buffer (json-parse-string str)))))
;;; (1.510604359 10 0.13192760000000003)

(with-current-buffer  "large.json"
  (let ((str (buffer-string)))
    (benchmark-run 10 (with-temp-buffer (json-read-from-string str)))))
;;; (1.970248228 114 1.058150570000001)
```

Thanks,
Sebastien Chapuis

Le dim. 15 avr. 2018 à 23:19, Eli Zaretskii <eliz@gnu.org> a écrit :
>
> > From: Sebastien Chapuis <sebastien@chapu.is>
> > Cc: 31138@debbugs.gnu.org
> > Date: Sun, 15 Apr 2018 16:40:18 +0200
> >
> >
> > > I'm surprised that the slowdown due to the conversion is so large,
> > > though.  It doesn't feel right, even with a 4MB string.
> >
> > I've digged a bit to know why it is so slow, and I've found that if I'm
> > wrapping `json-parse-string` with a `with-temp-buffer`, it is now way
> > faster:
> >
> > results of benchmark-run with a string of 4043212 characters
> > ```
> > (with-temp-buffer (json-parse-string str)):
> > (0.814315554 1 0.11941178500000005)
> >
> > (json-parse-string str):
> > (11.542233167 1 0.14954429599999997)
> >
> > (with-temp-buffer (json-read-from-string str)):
> > (5.9781185610000005 29 4.967349412000001)
> >
> > (json-read-from-string str):
> > (5.601267 24 4.723292248000001)
> > ```
>
> Interesting.
>
> > Any idea why ?
>
> Where did str come from?  Did you insert it into the buffer or
> something?  Could that explain the difference in performance?
>
> More generally, can you post the string you are using for the
> benchmarking, and the benchmark code as well?  That would make the
> discussion less abstract.
>
> > > Yes, it's necessary, because the input string may include raw bytes,
> > > which will crash Emacs if not handled properly.
> >
> > The Jansson documentation guarantee that the strings returned
> > from the library are always UTF-8 encoded [1].
>
> You assume that the library has no bugs, yes?  Because if it does,
> then we might crash Emacs by trusting it so much.  Letting invalid
> bytes creep into Emacs buffers and strings is a sure recipe for an
> eventual crash.
>
> > By knowing that guarantee, is it possible to reconsider the use of
> > code_convert_string ?
>
> Since it's already much faster than a Lisp implementation, why would
> we want to risk crashing an Emacs session by omitting the decoding?
>
> > Encoding a string to UTF-8 which is already UTF-8 encoded seems
> > useless..
>
> It's decoding, not encoding, and the process of decoding examines
> every sequence in the byte stream and ensures they are valid UTF-8.
>
> Emacs never trusts any external data to be what the user or Lisp tell
> it is; I see no reason why we should make an exception in this
> particular case.
>
> Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]