emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance


From: Paul Eggert
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Thu, 5 Oct 2017 18:58:34 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

On 10/05/2017 12:12 AM, Eli Zaretskii wrote:
If ptrdiff_t overflows are reliably diagnosed, then why do we have to
test for them explicitly in our code, as in the proposed json.c?
They're diagnosed only if one compiles with debugging flags like -fsanitize=undefined. And even then the checks are "reliable" only in some sense: some overflows at the source level are not caught at the machine level even if the code is executed, because the overflows are optimized away. So testing a program with -fsanizitize=undefined does not guarantee that the same test will avoid ptrdiff_t overflow on some other platform.

AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
whether a size_t value is too large

In the most recent patch I proposed, there were only two such checks; there were two other checks for too-large size_t that were fixnum checks, not ptrdiff_t checks.

However, I can see that you really don't like those checks. So I tweaked the proposed patch to remove them all from json.c. Please see the attached 3 patches (the first is just Philipp's patch rebased to master). The basic idea here is that json.c should be using xmalloc for allocation anyway, for reasons other than size overflow checking. And once it uses the Emacs malloc we can make sure that it never allocates objects that are too large for ptrdiff_t.

I'm not arguing for general replacement of ptrdiff_t with size_t, only
for doing that in those primitives where negative values are a clear
mistake/bug.

This is exactly where we should be cautious about using unsigned types. The longstanding Emacs style is to prefer signed integers, to avoid the common typos that occur with unsigned. If we start changing the style, these primitives (or people debugging these primitives) often won't be able to distinguish buggy from valid-but-enormous cases. And when we compile such primitives (or their callers) with -fsanitize=undefined, we won't be able to catch integer-overflow bugs automatically at runtime, since unsigned integer arithmetic silently wraps around even when -fsanitize=undefined is used.

"kids, don't do that at home -- we are trained professionals"

I help maintain several GNU programs that use unsigned types for sizes, and find that style to be trickier than the style Emacs uses, with respect to integer-overflow bugs. I've been gradually changing some of the non-Emacs GNU code to use signed types, and the results have been encouraging: the code is more readable and more obviously correct. I would rather not go back to the unsigned style: although you're right that it can be done, it is too error-prone and the errors can lead to serious bugs. Even for trained professionals, this particular set of acrobatics is best done with a net.

Attachment: 0001-Implement-native-JSON-support-using-Jansson.patch
Description: Text Data

Attachment: 0002-Do-not-malloc-more-than-PTRDIFF_MAX.patch
Description: Text Data

Attachment: 0003-Minor-JSON-cleanups-mostly-for-overflow.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]