emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance


From: Philipp Stephani
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Sun, 08 Oct 2017 23:04:10 +0000



Eli Zaretskii <address@hidden> schrieb am Do., 5. Okt. 2017 um 09:12 Uhr:
> Cc: address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Wed, 4 Oct 2017 14:24:59 -0700
>
> On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > if we did use size_t for the arguments which can clearly only be
> > non-negative, the problems which we are discussing would not have
> > happened
> Sure, but we would also have worse problems, as size_t is inherently
> more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
> is compiled with suitable GCC compiler options. size_t overflows cannot
> be diagnosed, are all too common, and can cause serious trouble.

If ptrdiff_t overflows are reliably diagnosed, then why do we have to
test for them explicitly in our code, as in the proposed json.c?
AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
whether a size_t value is too large, because similar checks for
ptrdiff_t values are already in the low-level subroutines involved in
creating Lisp objects.  So why couldn't those checks be avoided by
simply assigning to a ptrdiff_t variables?

Signed integer overflow is only diagnosed when compiling with -ftrapv or -fsanitize=undefined, which normally doesn't happen in production builds.
IIUC lossy numeric conversions are never diagnosed because they are not undefined behavior, so we always need to check for them explicitly.
We also need to strictly distinguish between the case where an overflow is a bug (incorrect assumptions in the Emacs source code itself) and the case where it is a legitimate outcome due to user input (in this case, return values from Jansson that overflow ptrdiff_t). The two cases need to be treated differently: In the first case we can say it's UB and rely on (hopefully regular and comprehensive) UBSan runs, in the second case we need explicit checks and normal signalling as opposed to assertions.
 

> The Emacs internals occasionally use size_t because underlying
> primitives like 'malloc' do, so we do make some exceptions. Perhaps
> there should be an exception here, for convenience with the JSON
> library. The code snippets I've seen so far in this thread are not
> enough context to judge whether an exception would be helpful in this
> case. Generally speaking, though, unsigned types should be avoided
> because they are more error-prone. This has long been the style in Emacs
> internals, and it's served us well.

I'm not arguing for general replacement of ptrdiff_t with size_t, only
for doing that in those primitives where negative values are a clear
mistake/bug.

For example, let's take this case from your proposed changes:

   static Lisp_Object
  -json_make_string (const char *data, ptrdiff_t size)
  +json_make_string (const char *data, size_t size)
   {
  +  if (PTRDIFF_MAX < size)
  +    string_overflow ();
     return make_specified_string (data, -1, size, true);
   }

If we were to change make_specified_string (and its subroutines, like
make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
argument, the need for the above check against PTRDIFF_MAX would
disappear.

It wouldn't disappear, it would merely be shifted around. Arguments could be made for either choice, but somewhere the check needs to happen.

  It would also make the higher-level code more
reliable, because application-level programmers will not need to
understand all the non-trivial intricacies of this stuff. 

Every C programmer needs to understand these issues, no matter what codebase they are working with. Lossy integral conversions are fundamental design choices of the C language that can't be avoided. 

C is a nasty language full of traps. You can try to hide some of the traps, but you can't remove them.
(Arguably I've yet to come across a language that doesn't have nasty integer types. Python 3, with integers auto-upgrading to bignums, might be closest.)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]