emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser


From: Christopher Wellons
Subject: Re: I created a faster JSON parser
Date: Sun, 10 Mar 2024 12:54:13 -0400
User-agent: NeoMutt/20170113 (1.7.2)

I'd glad if you can give some advices: which fuzzy-testing framework to use, which introductory material is worth reading, etc.

The Jansson repository has a libFuzzer-based fuzz test, which is perhaps a useful example. In it they define LLVMFuzzerTestOneInput, a function which accepts a buffer of input (pointer and length), which they feed into the code under test. That's basically it. In the new parser that buffer would go into json_parse. The tested code is instrumented, and the fuzz tester observes the affect inputs have on control flow, using that information to construct new inputs that explore new execution paths, trying to exercise as many as possible.

I'm partial to AFL++, and it's what I reach for first. It also works with GCC. It has two modes, with persistent mode preferred:

https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md

Same in principle, but with control inverted. For seed inputs, a few small JSON documents exercising the parser's features is sufficient. In either case, use -fsanitize=address,undefined so that defective execution paths are more likely to be detected. More assertions would help, too, such as "assert(input_current <= input_end)" in a number of places. Assertions must abort or trap so that the fuzz tester knows it found a defect.

Fuzz testing works better in a narrow scope. Ideally only the code being tested is instrumented. If it's running within an Emacs context, and you instrument all of Emacs, the fuzz tester would explore paths in Emacs reachable through the JSON parser rather than focus on the parser itself. That will waste time that could instead be spent exploring the parser.

You don't need to allocate lisp objects during fuzz testing. In fact, you should avoid it because that would just slow it down. (I even bet it's the bottleneck in the new parser.) Ideally the core parser consumes bytes and produces JSON events, and is agnostic to its greater context. To integrate with Emacs, you'd have additional, separate code that turns JSON events into lisp objects, and which wouldn't be fuzz tested.

Written that way, I could hook this core up to one of the above fuzz test interfaces, mock out whatever bits of Emacs might still be there (e.g. ckd_mul: the isolation need not be perfect), feed it the input, and pump events until either error (i.e. bad input detected, which is ignored) or EOF. The fuzz tester uses a timeout to detect infinite loops, which AFL++ will report as "hangs" and save the input for manual investigation. It should exercise JSON numeric parsing, too, at least to the extent that it's not punted to Emacs or strtod (mind your locale!). I'd also make available_depth much smaller so that the fuzzing could exercise failing checks.

To get the bulk of the value, the fuzz test does not necessarily need to be checked into source control, or even run as part of a standard test suite. Given a clean, decoupled interface and implementation, it would only take a few minutes to hook up a fuzz test. I was hoping to find just that, but each JSON function has multiple points of contact with Emacs, most especially json_parse_object.

I've done such ad-hoc fuzz testing on dozens of programs and libraries to evaluate their quality, and sometimes even improve them. In most cases, if can be fuzz tested and it's never been fuzz tested before, this technique finds fresh bugs in a matter of minutes, if not seconds. When I say it's incredibly effective, I mean it! Case in point from a few weeks ago, under similar circumstances, which can also serve as a practical example:

https://github.com/editorconfig/editorconfig-core-c/pull/103



reply via email to

[Prev in Thread] Current Thread [Next in Thread]