[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") |
Date: |
Fri, 17 Aug 2018 09:18:42 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Markus Armbruster <address@hidden> writes:
> Eric Blake <address@hidden> writes:
>
>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>> This is consistent with qobject_to_json(). See commit e2ec3f97680.
>>>
>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>> always escaped. RFC 7159 does not require it to be escaped on input,
>
> Weird, isn't it?
>
>>> but I wonder if any of your earlier testsuite improvements should
>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>> \u007f on round trip output.
>
> From utf8_string():
>
> /* 2.2.1 1 byte U+007F */
> {
> "\x7F",
> "\x7F",
> "\\u007F",
> },
>
> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
> that JSON (expecting the C string again). Sufficient?
>
>>>>
>>>> Signed-off-by: Markus Armbruster <address@hidden>
>>>> ---
>>>> qobject/json-lexer.c | 2 +-
>>>> qobject/json-parser.c | 2 +-
>>>> tests/check-qjson.c | 8 +-------
>>>> 3 files changed, 3 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>>>> index ca1e0e2c03..36fb665b12 100644
>>>> --- a/qobject/json-lexer.c
>>>> +++ b/qobject/json-lexer.c
>>>> @@ -93,7 +93,7 @@
>>>> * interpolation = %((l|ll|I64)[du]|[ipsf])
>>>> *
>>>> * Note:
>>>> - * - Input must be encoded in UTF-8.
>>>> + * - Input must be encoded in modified UTF-8.
>>>
>>> Worth documenting this in the QMP doc as an explicit extension?
>
> qmp-spec.txt:
>
> The sever expects its input to be encoded in UTF-8, and sends its
> output encoded in ASCII.
>
> The obvious update would be to stick in "modified".
Not really necessary, because:
* Before this patch, the JSON parser rejects \0 as ASCII control
character, and \xC0\x80 as overlong UTF-8.
Note that PATCH 17 fixed rejection of \0 in JSON strings. PATCH 21
fixed rejection of invalid UTF-8, but \xC0\x80 wasn't broken.
* This patch makes \xC0\x80 pass the "invalid UTF-8" check, only to get
rejected as ASCII control character. The error message changes,
that's all.
The patch's benefit is consistency with the other direction:
qobject_to_json() maps \xC0\x80 to \\u0000. I guess my commit message
should explain this a bit better.
[...]
- Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, (continued)
[Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected, Markus Armbruster, 2018/08/08