Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Date:	Fri, 17 Aug 2018 09:18:42 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Markus Armbruster <address@hidden> writes:

> Eric Blake <address@hidden> writes:
>
>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>>
>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>> always escaped. RFC 7159 does not require it to be escaped on input,
>
> Weird, isn't it?
>
>>> but I wonder if any of your earlier testsuite improvements should
>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>> \u007f on round trip output.
>
> From utf8_string():
>
>         /* 2.2.1  1 byte U+007F */
>         {
>             "\x7F",
>             "\x7F",
>             "\\u007F",
>         },
>
> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
> that JSON (expecting the C string again).  Sufficient?
>
>>>>
>>>> Signed-off-by: Markus Armbruster <address@hidden>
>>>> ---
>>>>   qobject/json-lexer.c  | 2 +-
>>>>   qobject/json-parser.c | 2 +-
>>>>   tests/check-qjson.c   | 8 +-------
>>>>   3 files changed, 3 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>>>> index ca1e0e2c03..36fb665b12 100644
>>>> --- a/qobject/json-lexer.c
>>>> +++ b/qobject/json-lexer.c
>>>> @@ -93,7 +93,7 @@
>>>>    *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>>    *
>>>>    * Note:
>>>> - * - Input must be encoded in UTF-8.
>>>> + * - Input must be encoded in modified UTF-8.
>>>
>>> Worth documenting this in the QMP doc as an explicit extension?
>
> qmp-spec.txt:
>
>     The sever expects its input to be encoded in UTF-8, and sends its
>     output encoded in ASCII.
>
> The obvious update would be to stick in "modified".

Not really necessary, because:

* Before this patch, the JSON parser rejects \0 as ASCII control
  character, and \xC0\x80 as overlong UTF-8.

  Note that PATCH 17 fixed rejection of \0 in JSON strings.  PATCH 21
  fixed rejection of invalid UTF-8, but \xC0\x80 wasn't broken.

* This patch makes \xC0\x80 pass the "invalid UTF-8" check, only to get
  rejected as ASCII control character.  The error message changes,
  that's all.

The patch's benefit is consistency with the other direction:
qobject_to_json() maps \xC0\x80 to \\u0000.  I guess my commit message
should explain this a bit better.

[...]

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, (continued)
- [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/13
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake, 2018/08/13
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/14
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster <=
- [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/13
  - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Paolo Bonzini, 2018/08/12
    - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/13
- [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Eric Blake, 2018/08/16
    - Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/16
- [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected, Eric Blake, 2018/08/09

Prev by Date: Re: [Qemu-devel] [libvirt] clean/simple Q35 support in libvirt+QEMU for guest OSes that don't support virtio-1.0
Next by Date: Re: [Qemu-devel] Bugs when cross-compiling qemu for Windows with mingw 8.1, executable doesn't run
Previous by thread: Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Next by thread: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
Index(es):
- Date
- Thread