qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequ


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
Date: Fri, 10 Aug 2018 10:56:45 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
Both lexer and parser reject invalid escape sequences in strings.  The
parser's check is useless.



Drop the lexer's escape sequence checking, and make it accept the same
characters after '\' it accepts elsewhere in strings.  It now produces

     JSON_LCURLY   {
     JSON_STRING   "address@hidden"
     JSON_COLON    :
     JSON_INTEGER  1
     JSON_RCURLY

and the parser reports just

     JSON parse error, invalid escape sequence in string

While there, fix parse_string()'s inaccurate function comment.

Worthwhile improvement.


Signed-off-by: Markus Armbruster <address@hidden>
---
  qobject/json-lexer.c  | 72 +++----------------------------------------
  qobject/json-parser.c | 56 +++++++++++++++++++--------------
  2 files changed, 37 insertions(+), 91 deletions(-)

and shorter!

      [IN_DQ_STRING_ESCAPE] = {
-        ['b'] = IN_DQ_STRING,
-        ['f'] =  IN_DQ_STRING,
-        ['n'] =  IN_DQ_STRING,
-        ['r'] =  IN_DQ_STRING,
-        ['t'] =  IN_DQ_STRING,
-        ['/'] = IN_DQ_STRING,
-        ['\\'] = IN_DQ_STRING,
-        ['\''] = IN_DQ_STRING,
-        ['\"'] = IN_DQ_STRING,
-        ['u'] = IN_DQ_UCODE0,
+        [0x20 ... 0xFD] = IN_DQ_STRING,

Among other things, this means the parser now has to flag "\u" as an incomplete escape - but your added testsuite coverage earlier in the series ensures that we do.

+++ b/qobject/json-parser.c
@@ -106,30 +106,40 @@ static int hex2decimal(char ch)
  }
/**
- * parse_string(): Parse a json string and return a QObject
+ * parse_string(): Parse a JSON string
   *
- *  string

+ * From RFC 7159 "The JavaScript Object Notation (JSON) Data
+ * Interchange Format":
+ *
+ *    char = unescaped /
+ *        escape (
+ *            %x22 /          ; "    quotation mark  U+0022
+ *            %x5C /          ; \    reverse solidus U+005C
+ *            %x2F /          ; /    solidus         U+002F
+ *            %x62 /          ; b    backspace       U+0008
+ *            %x66 /          ; f    form feed       U+000C
+ *            %x6E /          ; n    line feed       U+000A
+ *            %x72 /          ; r    carriage return U+000D
+ *            %x74 /          ; t    tab             U+0009
+ *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
+ *    escape = %x5C              ; \
+ *    quotation-mark = %x22      ; "
+ *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
+ *
+ * Extensions over RFC 7159:
+ * - Extra escape sequence in strings:
+ *   0x27 (apostrophe) is recognized after escape, too
+ * - Single-quoted strings:
+ *   Like double-quoted strings, except they're delimited by %x27
+ *   (apostrophe) instead of %x22 (quotation mark), and can't contain
+ *   unescaped apostrophe, but can contain unescaped quotation mark.
+ *
+ * Note:
+ * - Encoding is modified UTF-8.

That is an extension over RFC 7159. But I'm okay with leaving it in the Notes section.

+ * - Invalid Unicode characters are rejected.
+ * - Control characters are rejected by the lexer.

Worth being explicit that this is 00-1f, fe, and ff?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]