[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: I created a faster JSON parser
From: |
Christopher Wellons |
Subject: |
Re: I created a faster JSON parser |
Date: |
Sun, 10 Mar 2024 12:54:13 -0400 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
I'd glad if you can give some advices: which fuzzy-testing framework to
use, which introductory material is worth reading, etc.
The Jansson repository has a libFuzzer-based fuzz test, which is perhaps a
useful example. In it they define LLVMFuzzerTestOneInput, a function which
accepts a buffer of input (pointer and length), which they feed into the
code under test. That's basically it. In the new parser that buffer would
go into json_parse. The tested code is instrumented, and the fuzz tester
observes the affect inputs have on control flow, using that information to
construct new inputs that explore new execution paths, trying to exercise
as many as possible.
I'm partial to AFL++, and it's what I reach for first. It also works with
GCC. It has two modes, with persistent mode preferred:
https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md
Same in principle, but with control inverted. For seed inputs, a few small
JSON documents exercising the parser's features is sufficient. In either
case, use -fsanitize=address,undefined so that defective execution paths
are more likely to be detected. More assertions would help, too, such as
"assert(input_current <= input_end)" in a number of places. Assertions
must abort or trap so that the fuzz tester knows it found a defect.
Fuzz testing works better in a narrow scope. Ideally only the code being
tested is instrumented. If it's running within an Emacs context, and you
instrument all of Emacs, the fuzz tester would explore paths in Emacs
reachable through the JSON parser rather than focus on the parser itself.
That will waste time that could instead be spent exploring the parser.
You don't need to allocate lisp objects during fuzz testing. In fact, you
should avoid it because that would just slow it down. (I even bet it's the
bottleneck in the new parser.) Ideally the core parser consumes bytes and
produces JSON events, and is agnostic to its greater context. To integrate
with Emacs, you'd have additional, separate code that turns JSON events
into lisp objects, and which wouldn't be fuzz tested.
Written that way, I could hook this core up to one of the above fuzz test
interfaces, mock out whatever bits of Emacs might still be there (e.g.
ckd_mul: the isolation need not be perfect), feed it the input, and pump
events until either error (i.e. bad input detected, which is ignored) or
EOF. The fuzz tester uses a timeout to detect infinite loops, which AFL++
will report as "hangs" and save the input for manual investigation. It
should exercise JSON numeric parsing, too, at least to the extent that
it's not punted to Emacs or strtod (mind your locale!). I'd also make
available_depth much smaller so that the fuzzing could exercise failing
checks.
To get the bulk of the value, the fuzz test does not necessarily need to
be checked into source control, or even run as part of a standard test
suite. Given a clean, decoupled interface and implementation, it would
only take a few minutes to hook up a fuzz test. I was hoping to find just
that, but each JSON function has multiple points of contact with Emacs,
most especially json_parse_object.
I've done such ad-hoc fuzz testing on dozens of programs and libraries to
evaluate their quality, and sometimes even improve them. In most cases, if
can be fuzz tested and it's never been fuzz tested before, this technique
finds fresh bugs in a matter of minutes, if not seconds. When I say it's
incredibly effective, I mean it! Case in point from a few weeks ago, under
similar circumstances, which can also serve as a practical example:
https://github.com/editorconfig/editorconfig-core-c/pull/103
- Re: I created a faster JSON parser, (continued)
- Re: I created a faster JSON parser, Herman , Géza, 2024/03/12
- Re: I created a faster JSON parser, Mattias Engdegård, 2024/03/12
- Re: I created a faster JSON parser, Mattias Engdegård, 2024/03/12
- Re: I created a faster JSON parser, Herman , Géza, 2024/03/12
- Re: I created a faster JSON parser, Gerd Möllmann, 2024/03/12
Re: I created a faster JSON parser, Herman , Géza, 2024/03/10
- Re: I created a faster JSON parser,
Christopher Wellons <=
- Re: I created a faster JSON parser, Herman , Géza, 2024/03/10
- Re: I created a faster JSON parser, Christopher Wellons, 2024/03/10
- Re: I created a faster JSON parser, Herman , Géza, 2024/03/11
- Re: I created a faster JSON parser, Christopher Wellons, 2024/03/11