Re: I created a faster JSON parser

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser

From:	Christopher Wellons
Subject:	Re: I created a faster JSON parser
Date:	Sun, 10 Mar 2024 12:54:13 -0400
User-agent:	NeoMutt/20170113 (1.7.2)

I'd glad if you can give some advices: which fuzzy-testing framework touse, which introductory material is worth reading, etc.

The Jansson repository has a libFuzzer-based fuzz test, which is perhaps auseful example. In it they define LLVMFuzzerTestOneInput, a function whichaccepts a buffer of input (pointer and length), which they feed into thecode under test. That's basically it. In the new parser that buffer wouldgo into json_parse. The tested code is instrumented, and the fuzz testerobserves the affect inputs have on control flow, using that information toconstruct new inputs that explore new execution paths, trying to exerciseas many as possible.

I'm partial to AFL++, and it's what I reach for first. It also works withGCC. It has two modes, with persistent mode preferred:


https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md

Same in principle, but with control inverted. For seed inputs, a few smallJSON documents exercising the parser's features is sufficient. In eithercase, use -fsanitize=address,undefined so that defective execution pathsare more likely to be detected. More assertions would help, too, such as"assert(input_current <= input_end)" in a number of places. Assertionsmust abort or trap so that the fuzz tester knows it found a defect.

Fuzz testing works better in a narrow scope. Ideally only the code beingtested is instrumented. If it's running within an Emacs context, and youinstrument all of Emacs, the fuzz tester would explore paths in Emacsreachable through the JSON parser rather than focus on the parser itself.That will waste time that could instead be spent exploring the parser.

You don't need to allocate lisp objects during fuzz testing. In fact, youshould avoid it because that would just slow it down. (I even bet it's thebottleneck in the new parser.) Ideally the core parser consumes bytes andproduces JSON events, and is agnostic to its greater context. To integratewith Emacs, you'd have additional, separate code that turns JSON eventsinto lisp objects, and which wouldn't be fuzz tested.

Written that way, I could hook this core up to one of the above fuzz testinterfaces, mock out whatever bits of Emacs might still be there (e.g.ckd_mul: the isolation need not be perfect), feed it the input, and pumpevents until either error (i.e. bad input detected, which is ignored) orEOF. The fuzz tester uses a timeout to detect infinite loops, which AFL++will report as "hangs" and save the input for manual investigation. Itshould exercise JSON numeric parsing, too, at least to the extent thatit's not punted to Emacs or strtod (mind your locale!). I'd also makeavailable_depth much smaller so that the fuzzing could exercise failingchecks.

To get the bulk of the value, the fuzz test does not necessarily need tobe checked into source control, or even run as part of a standard testsuite. Given a clean, decoupled interface and implementation, it wouldonly take a few minutes to hook up a fuzz test. I was hoping to find justthat, but each JSON function has multiple points of contact with Emacs,most especially json_parse_object.

I've done such ad-hoc fuzz testing on dozens of programs and libraries toevaluate their quality, and sometimes even improve them. In most cases, ifcan be fuzz tested and it's never been fuzz tested before, this techniquefinds fresh bugs in a matter of minutes, if not seconds. When I say it'sincredibly effective, I mean it! Case in point from a few weeks ago, undersimilar circumstances, which can also serve as a practical example:


https://github.com/editorconfig/editorconfig-core-c/pull/103

[Prev in Thread]

Current Thread

[Next in Thread]

Re: I created a faster JSON parser, (continued)

Prev by Date: Re: [ELPA] new single file package: mpdired.el
Next by Date: prior work on non-backtracking regex engine?
Previous by thread: Re: I created a faster JSON parser
Next by thread: Re: I created a faster JSON parser
Index(es):
- Date
- Thread