qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Date: Fri, 17 Aug 2018 09:18:42 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Markus Armbruster <address@hidden> writes:

> Eric Blake <address@hidden> writes:
>
>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>>
>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>> always escaped. RFC 7159 does not require it to be escaped on input,
>
> Weird, isn't it?
>
>>> but I wonder if any of your earlier testsuite improvements should
>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>> \u007f on round trip output.
>
> From utf8_string():
>
>         /* 2.2.1  1 byte U+007F */
>         {
>             "\x7F",
>             "\x7F",
>             "\\u007F",
>         },
>
> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
> that JSON (expecting the C string again).  Sufficient?
>
>>>>
>>>> Signed-off-by: Markus Armbruster <address@hidden>
>>>> ---
>>>>   qobject/json-lexer.c  | 2 +-
>>>>   qobject/json-parser.c | 2 +-
>>>>   tests/check-qjson.c   | 8 +-------
>>>>   3 files changed, 3 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>>>> index ca1e0e2c03..36fb665b12 100644
>>>> --- a/qobject/json-lexer.c
>>>> +++ b/qobject/json-lexer.c
>>>> @@ -93,7 +93,7 @@
>>>>    *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>>    *
>>>>    * Note:
>>>> - * - Input must be encoded in UTF-8.
>>>> + * - Input must be encoded in modified UTF-8.
>>>
>>> Worth documenting this in the QMP doc as an explicit extension?
>
> qmp-spec.txt:
>
>     The sever expects its input to be encoded in UTF-8, and sends its
>     output encoded in ASCII.
>
> The obvious update would be to stick in "modified".

Not really necessary, because:

* Before this patch, the JSON parser rejects \0 as ASCII control
  character, and \xC0\x80 as overlong UTF-8.

  Note that PATCH 17 fixed rejection of \0 in JSON strings.  PATCH 21
  fixed rejection of invalid UTF-8, but \xC0\x80 wasn't broken.

* This patch makes \xC0\x80 pass the "invalid UTF-8" check, only to get
  rejected as ASCII control character.  The error message changes,
  that's all.

The patch's benefit is consistency with the other direction:
qobject_to_json() maps \xC0\x80 to \\u0000.  I guess my commit message
should explain this a bit better.

[...]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]