[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: JSON/YAML/TOML/etc. parsing performance
From: |
Eli Zaretskii |
Subject: |
Re: JSON/YAML/TOML/etc. parsing performance |
Date: |
Thu, 05 Oct 2017 10:12:30 +0300 |
> Cc: address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Wed, 4 Oct 2017 14:24:59 -0700
>
> On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > if we did use size_t for the arguments which can clearly only be
> > non-negative, the problems which we are discussing would not have
> > happened
> Sure, but we would also have worse problems, as size_t is inherently
> more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
> is compiled with suitable GCC compiler options. size_t overflows cannot
> be diagnosed, are all too common, and can cause serious trouble.
If ptrdiff_t overflows are reliably diagnosed, then why do we have to
test for them explicitly in our code, as in the proposed json.c?
AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
whether a size_t value is too large, because similar checks for
ptrdiff_t values are already in the low-level subroutines involved in
creating Lisp objects. So why couldn't those checks be avoided by
simply assigning to a ptrdiff_t variables?
> The Emacs internals occasionally use size_t because underlying
> primitives like 'malloc' do, so we do make some exceptions. Perhaps
> there should be an exception here, for convenience with the JSON
> library. The code snippets I've seen so far in this thread are not
> enough context to judge whether an exception would be helpful in this
> case. Generally speaking, though, unsigned types should be avoided
> because they are more error-prone. This has long been the style in Emacs
> internals, and it's served us well.
I'm not arguing for general replacement of ptrdiff_t with size_t, only
for doing that in those primitives where negative values are a clear
mistake/bug.
For example, let's take this case from your proposed changes:
static Lisp_Object
-json_make_string (const char *data, ptrdiff_t size)
+json_make_string (const char *data, size_t size)
{
+ if (PTRDIFF_MAX < size)
+ string_overflow ();
return make_specified_string (data, -1, size, true);
}
If we were to change make_specified_string (and its subroutines, like
make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
argument, the need for the above check against PTRDIFF_MAX would
disappear.
Another such case is 'insert', which is also used in json.c, and
requires a similar check:
void
insert (const char *string, ptrdiff_t nbytes)
{
if (nbytes > 0)
{
ptrdiff_t len = chars_in_text ((unsigned char *) string, nbytes),
opoint;
insert_1_both (string, len, nbytes, 0, 1, 0);
opoint = PT - len;
signal_after_change (opoint, 0, len);
update_compositions (opoint, PT, CHECK_BORDER);
}
}
It clearly ignores negative values of nbytes, as expected. So why not
make nbytes a size_t argument? (We will probably need some low-level
changes inside the subroutines of insert_1_both, like move_gap, to
reject too large size_t values before we convert them to signed
values, but that's hardly rocket science.)
I envision that all the Fmake_SOMETHING primitives could use similar
changes to have the size specified as size_t, because it can never be
negative. E.g., Fmake_vector is used by json.c and currently requires
a similar check because its size argument is a signed type.
IOW, I'm saying that using size_t judiciously, in a small number of
places, would make a lot of sense and allow us to simplify
higher-level code, and make it faster by avoiding duplicate checks of
the same values. It would also make the higher-level code more
reliable, because application-level programmers will not need to
understand all the non-trivial intricacies of this stuff. As Emacs
starts using more and more external libraries, whether built-in or via
modules, the issue of size_t vs ptrdiff_t will become more and more
important, and a source for more and more error-prone code. Why not
fix that in advance in our primitives?
> (Ironically, just last week I was telling beginning students to beware
> unsigned types, with (0u < -1) as an example....)
Well, "kids, don't do that at home -- we are trained professionals"
seems to apply here ;-)
- Re: JSON/YAML/TOML/etc. parsing performance, (continued)
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/04
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/04
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/04
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/04
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/04
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/05
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/08
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/09
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/29
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/09
- Re: JSON/YAML/TOML/etc. parsing performance,
Eli Zaretskii <=
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/05
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/06
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/06
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/06
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/08
- Re: JSON/YAML/TOML/etc. parsing performance, Paul Eggert, 2017/10/09
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/29
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/29
- Re: JSON/YAML/TOML/etc. parsing performance, Philipp Stephani, 2017/10/08
- Re: JSON/YAML/TOML/etc. parsing performance, Eli Zaretskii, 2017/10/09