[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs |
Date: |
Mon, 20 Aug 2018 10:40:04 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/17/2018 10:05 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate. Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> Reviewed-by: Eric Blake <address@hidden>
>
> I might have dropped the R-b, to ensure the changes since v1 get
> re-reviewed.
I intended to, but screwed up. My apologies.
>> ---
>> qobject/json-parser.c | 60 ++++++++++++++++++++++++++++---------------
>> tests/check-qjson.c | 3 +--
>> 2 files changed, 40 insertions(+), 23 deletions(-)
>>
>
>> @@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt,
>> JSONToken *token)
>> qstring_append_chr(str, '\t');
>> break;
>> case 'u':
>> - cp = 0;
>> - for (i = 0; i < 4; i++) {
>> - if (!qemu_isxdigit(*ptr)) {
>> - parse_error(ctxt, token,
>> - "invalid hex escape sequence in
>> string");
>> - goto out;
>> + cp = cvt4hex(ptr);
>> + ptr += 4;
>> +
>> + /* handle surrogate pairs */
>> + if (cp >= 0xD800 && cp <= 0xDBFF
>> + && ptr[0] == '\\' && ptr[1] == 'u') {
>> + /* leading surrogate followed by \u */
>> + cp = 0x10000 + ((cp & 0x3FF) << 10);
>> + trailing = cvt4hex(ptr + 2);
>> + if (trailing >= 0xDC00 && trailing <= 0xDFFF) {
>> + /* followed by trailing surrogate */
>> + cp |= trailing & 0x3FF;
>> + ptr += 6;
>> + } else {
>> + cp = -1; /* invalid */
>> }
>> - cp <<= 4;
>> - cp |= hex2decimal(*ptr);
>> - ptr++;
>> }
>> if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf),
>> cp) < 0) {
>> parse_error(ctxt, token,
>> - "\\u%.4s is not a valid Unicode character",
>> - ptr - 3);
>> + "%.*s is not a valid Unicode character",
>> + (int)(ptr - beg), beg);
>
> The error reporting here has indeed been improved over v1.
>
> Reviewed-by: Eric Blake <address@hidden>
Thanks!
- [Qemu-devel] [PATCH v2 34/60] json: Don't pass null @tokens to json_parser_parse(), (continued)
- [Qemu-devel] [PATCH v2 34/60] json: Don't pass null @tokens to json_parser_parse(), Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 23/60] json: Leave rejecting invalid UTF-8 to parser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 30/60] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 25/60] json: Leave rejecting invalid escape sequences to parser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 38/60] json: Pass lexical errors and limit violations to callback, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 44/60] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 21/60] json: Reject invalid UTF-8 sequences, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 24/60] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 13/60] check-qjson: Fix utf8_string() to test all invalid sequences, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 18/60] json: Revamp lexer documentation, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 27/60] json: Reject invalid \uXXXX, fix \u0000, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 36/60] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 52/60] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP, Markus Armbruster, 2018/08/17