[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to pa
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser |
Date: |
Fri, 10 Aug 2018 10:36:02 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
On 08/08/2018 07:03 AM, Markus Armbruster wrote:
Both the lexer and the parser (attempt to) validate UTF-8 in JSON
strings.
The commit before previous made the parser reject invalid UTF-8
sequences. Since then, anything the lexer rejects, the parser would
reject as well. Thus, the lexer's rejecting is unnecessary for
correctness, and harmful for error reporting.
Nice analysis.
However, we want to keep rejecting ASCII control characters in the
lexer, because that produces the behavior we want for unclosed
strings.
We also need to keep rejecting \xFF in the lexer, because we
documented that as a way to reset the JSON parser
(docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
means we can't change how we recover from this error now. I wish we
hadn't done that.
Or, if we give special meaning to 0xff to cause a lexer reset without
also emitting an error message, as a design decision. (Doesn't change
this patch - that would be a change on top).
I think we should treat \xFE the same as \xFF.
Reasonable, as it would cover byte-order-marks.
Change the lexer to accept \xC0..\xC1 and \xF5..\xFD. It now rejects
only \x00..\x1F and \xFE..\xFF. Error reporting for invalid UTF-8 in
strings is much improved, except for \xFE and \xFF. For the example
above, the lexer now produces
JSON_LCURLY {
JSON_STRING "abc\xC0\xAFijk"
JSON_COLON :
JSON_INTEGER 1
JSON_RCURLY
and the parser reports just
JSON parse error, invalid UTF-8 sequence in string
Signed-off-by: Markus Armbruster <address@hidden>
---
qobject/json-lexer.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Reviewed-by: Eric Blake <address@hidden>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
- Re: [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1, (continued)
- [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser, Markus Armbruster, 2018/08/08
- Re: [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser,
Eric Blake <=
- [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation, Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 26/56] json: Simplify parse_string(), Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation, Markus Armbruster, 2018/08/08
[Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values, Markus Armbruster, 2018/08/08