emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser


From: Eli Zaretskii
Subject: Re: I created a faster JSON parser
Date: Fri, 08 Mar 2024 18:22:53 +0200

> From: Herman, Géza <geza.herman@gmail.com>
> Cc: Géza Herman <geza.herman@gmail.com>,
>  emacs-devel@gnu.org
> Date: Fri, 08 Mar 2024 16:20:40 +0100
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >  . The way you break a long line at the equals sign '=' is 
> >  another (we
> >    break after '=', not before).
> I used clang-format to format my code (I use a completely 
> different coding style).  I see that clang-format is configured 
> this way in Emacs.  Shouldn't BreakBeforeBinaryOperators be set to 
> None or NonAssignment in .clang-format?

Actually, I see we use both styles, so I guess you can disregard that
part.

> >  . The code which handles integers seems to assume that 
> >  'unsigned long'
> >    is a 64-bit type? if so, this is not true on Windows; please 
> >    see how
> >    we handle this elsewhere in Emacs, in particular in the
> >    WIDE_EMACS_INT case.
> That was a mistake on my part, though a different (but similar) 
> one.  I originally used a 64-bit type, but then changed it to 
> long, because of 32-bit architectures.  The idea is to use a type 
> which likely has the same size as a CPU register.

If you want to use a 32-bit type, use 'int' or 'unsigned int'.

> > A more general comment is that you seem to be parsing buffer text
> > assuming it's UTF-8?  If so, this is not accurate, as the internal
> > representation is a superset of UTF-8, and can represent
> > characters above 0x10FFFF.
>
> When does a buffer have characters above 0x10ffff?

See the node "Text Representations" in the ELisp manual, it explains
that.

> I supposed that a JSON shouldn't contain characters that are out of
> range.  But if the solution is to just remove the upper-range
> comparison, I can do that easily.

We need to decide what to do with characters outside of the Unicode
range.  The encoding of those characters is different, btw, it doesn't
follow the UTF-8 scheme.

In any case, the error in those cases cannot be just "json parse
error", it must be something more self-explanatory.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]