Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Date:	Mon, 04 Feb 2013 19:09:13 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Paolo Bonzini <address@hidden> writes:

> Il 04/02/2013 18:19, Markus Armbruster ha scritto:
>> +        /* 2  Boundary condition test cases */
>> +        /* 2.1  First possible sequence of a certain length */
>> +        /* 2.1.5  5 bytes U+200000 */
>> +        {
>> +            "\"\xF8\x88\x80\x80\x80\"",
>> +            NULL,                        /* bug: rejected */
>> +            "\"\\u8200\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> +            "\xF8\x88\x80\x80\x80",
>> +        },
>> +        /* 2.1.6  6 bytes U+4000000 */
>> +        {
>> +            "\"\xFC\x84\x80\x80\x80\x80\"",
>> +            NULL,                               /* bug: rejected */
>> +            "\"\\uC100\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" 
>> */
>> +            "\xFC\x84\x80\x80\x80\x80",
>> +        },
>> +        },
>> +        /* 2.2.4  4 bytes U+1FFFFF */
>> +        {
>> +            "\"\xF7\xBF\xBF\xBF\"",
>> +            NULL,                 /* bug: rejected */
>> +            "\"\\u7FFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> +            "\xF7\xBF\xBF\xBF",
>> +        },
>> +        /* 2.2.5  5 bytes U+3FFFFFF */
>> +        {
>> +            "\"\xFB\xBF\xBF\xBF\xBF\"",
>> +            NULL,                        /* bug: rejected */
>> +            "\"\\uBFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> +            "\xFB\xBF\xBF\xBF\xBF",
>> +        },
>> +        /* 2.2.6  6 bytes U+7FFFFFFF */
>> +        {
>> +            "\"\xFD\xBF\xBF\xBF\xBF\xBF\"",
>> +            NULL,                               /* bug: rejected */
>> +            "\"\\uDFFF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" 
>> */
>> +            "\xFD\xBF\xBF\xBF\xBF\xBF",
>> +        },
>> +        {
>> +            /* \U+1FFFFF */
>> +            "\"\xF8\x87\xBF\xBF\xBF\"",
>> +            NULL,                        /* bug: rejected */
>> +            "\"\\u81FF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> +            "\xF8\x87\xBF\xBF\xBF",
>> +        },
>> +        {
>> +            /* \U+3FFFFFF */
>> +            "\"\xFC\x83\xBF\xBF\xBF\xBF\"",
>> +            NULL,                               /* bug: rejected */
>> +            "\"\\uC0FF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" 
>> */
>> +            "\xFC\x83\xBF\xBF\xBF\xBF",
>> +        },
>> +        {
>> +            /* \U+0000 */
>> +            "\"\xF8\x80\x80\x80\x80\"",
>> +            NULL,                        /* bug: rejected */
>> +            "\"\\u8000\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */
>> +            "\xF8\x80\x80\x80\x80",
>> +        },
>> +        {
>> +            /* \U+0000 */
>> +            "\"\xFC\x80\x80\x80\x80\x80\"",
>> +            NULL,                               /* bug: rejected */
>> +            "\"\\uC000\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" 
>> */
>> +            "\xFC\x80\x80\x80\x80\x80",
>> +        },
>
> Rejecting these is not a bug IMO.  Unicode is only defined up to
> U+10FFFF.  Codepoints above are not valid UTF-8 at all, and in
> particular 5/6-byte sequences are never valid UTF-8 (they used to be).

See explanation of bug markers above:

+         * - bug: rejected
+         *   JSON parser rejects invalid sequence(s)
+         *   We may choose to define this as feature

> But there are indeed other bugs...
>
>> +        /* 2.1.4  4 bytes U+10000 */
>> +        {
>> +            "\"\xF0\x90\x80\x80\"",
>> +            "\xF0\x90\x80\x80",
>> +            "\"\\u0400\\uFFFF\"", /* bug: want "\"\\uD800\\uDC00\"" */
>> +        },
>> +            /* U+10FFFF */
>> +            "\"\xF4\x8F\xBF\xBF\"",
>> +            "\xF4\x8F\xBF\xBF",
>> +            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
>> +        },
>> +        {
>> +            /* U+110000 */
>> +            "\"\xF4\x90\x80\x80\"",
>> +            "\xF4\x90\x80\x80",
>> +            "\"\\u4400\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>> +        },
>
> ...and also some good catches here!  In particular U+110000 should be
> rejected.

Thanks!

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/04
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Paolo Bonzini, 2013/02/04
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster <=
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/04
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/05
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Kuhn, 2013/02/05
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/27
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/28

Prev by Date: Re: [Qemu-devel] [PATCH 00/19] hw/ directory restructuring
Next by Date: Re: [Qemu-devel] [PATCH 5/7] qbus_find_recursive(): terminate search by name in case of fatal error
Previous by thread: Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Next by thread: Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Index(es):
- Date
- Thread