|
From: | Herman , Géza |
Subject: | Re: I created a faster JSON parser |
Date: | Sun, 10 Mar 2024 07:58:12 +0100 |
Christopher Wellons <wellons@nullprogram.com> writes:
What do you think?In review I noticed a potential pointer overflow in json_parse_string:parser->input_current + 4 <= parser->input_end [...]In json_make_object_workspace_for and json_byte_workspace_put, a sizeis doubled without an overflow check ("new_workspace_size * 2").
Thanks for the review and finding these problems! I fixed them: https://github.com/geza-herman/emacs/commit/cbbf3dd494034750ff324703e64f1125a1056832.patch
But this JSON parser is tightly coupled with the Emacs Lisp runtime, which greatly complicatesthings. I couldn't simply pluck it out by itself and drop it in, say,AFL++.
Yes, it needs some work. The Lisp Object creation part is only done at very specific places, it's easy to remove them (actually, I wrote this parser outside of Emacs, and then just put it in by adding the necessary Lisp Object creation code). Or, if the fuzzer needs the actual output (I mean, the result of the parsing), it shouldn't be too hard to put some code there which provides the output. The other thing is error handling, but it also can be easily replaced by using longjmp.
I'm happy to do this work, I'd just need some directions how to do it. I'm not experienced with fuzzy testing, so if you are, I'd glad if you can give some advices: which fuzzy-testing framework to use, which introductory material is worth reading, etc.
As noted earlier, the parser gets its performance edge throughskipping the intermediate steps. This is great! That could still be accomplished without such tight coupling, allowing for performance*and* an interface that is testable and fuzzable in relative isolation.
Yes, I think a SAX parser like interface would have a very little cost. But honestly, I don't see the point of it. This is a parser for Emacs only. It has a very specific purpose, to make JSON parsing fast in Emacs. It is a small module. Input is JSON, output is Lisp Objects. Working with Lisp Objects inside Emacs is a natural thing, usually there is no need for intermediate representations. So if the only reason to have a Emacs-independent API is to make the parser fuzzy-testable, then wouldn't it make more sense to make Emacs fuzzy-testable in general? I find this approach more useful, because I think it's not just this parser which can be a sensible target for fuzzy testing.
[Prev in Thread] | Current Thread | [Next in Thread] |