emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser


From: Herman , Géza
Subject: Re: I created a faster JSON parser
Date: Mon, 11 Mar 2024 15:35:45 +0100


Mattias Engdegård <mattias.engdegard@gmail.com> writes:

11 mars 2024 kl. 14.29 skrev Eli Zaretskii <eliz@gnu.org>:

What you describe are possible fallbacks, but I would prefer not to use any fallback at all, but instead have a full C implementation.

Yes, I definitely think we should do that. I'm pretty sure that
writing a JSON unparser is a lot easier than doing the parser, and the extra speed we stand to gain from not having the intermediate jansson
step is not without interest.

FYI: I checked out a JSON benchmark, and it turned out that jansson is not a fast parser, there are faster libraries. If a library has a SAX interface, that could be a potentially useful library for Emacs. According to https://github.com/miloyip/nativejson-benchmark, RapidJSON is at least 10x faster than jansson. I'm just saying this because Emacs doesn't have to stick with my parser, there are possible alternatives, which have JSON serializers as well.

(But note: I am happy to make my parser into a mergeable state, and if eventually it gets merged then fixing its bugs, but I'm not motivated to work on integrating other JSON libraries).

Overall the proposed parser looks fine, nothing terribly wrong that can't be fixed later on. A few minor points:

* The `is_single_uninteresting` array is hard to review and badly formatted. It appears to be 1 for all printable ASCII plus DEL except
double-quote and backslash. (Why DEL?)

Yep, the formatting of that table got destroyed when I reformatted the code into GNU style. Now I formatted the table back, and added comments for each row/col. Here's the latest version: https://github.com/geza-herman/emacs/commit/4b5895636c1ec06e630baf47881b246c198af056.patch

I'm not sure about DEL: I haven't seen anything which says that it's an invalid character in a string, so the parser currently allows it.

* Do you really need to maintain line and column during the parse? If you want them for error reporting, you can materialise them from the
offset that you already have.

Yeah, I thought of that, but it turned out that maintaining the line/column doesn't have an impact on performance. I added that easily, tough admittedly it's a little bit awkward to maintain these variables. If emacs has a way to tell from the byte-pointer the line/col position (both for strings and buffers), I am happy to use that instead. It would be a better solution, because currently the parser always starts from line 1, col 1, which means that if json-parse-buffer is used, these numbers will be local to the current parsing, not actual numbers related to the whole buffer. But as the jansson based parsed behaves the same, I thought it's OK.

* Are you sure that GC can't run during parsing or that all your Lisp
objects are reachable directly from the stack? (It's the
`object_workspace` in particular that's worrying me a bit.)

That's a very good question. I suppose that object_workspace is invisible to the Lisp VM, as it is just a malloc'd object. But I've never seen a problem because of this. What triggers the GC? Is it possible that for the duration of the whole parsing, GC is never get triggered? Otherwise it should have GCd the objects in object_workspace, causing problems (I tried this parser in a loop, where GC is caused hundreds of times. In the loop, I compared the result to json-read, everything was fine).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]