[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v2 23/60] json: Leave rejecting invalid UTF-8 to par
From: |
Markus Armbruster |
Subject: |
[Qemu-devel] [PATCH v2 23/60] json: Leave rejecting invalid UTF-8 to parser |
Date: |
Fri, 17 Aug 2018 17:05:22 +0200 |
Both the lexer and the parser (attempt to) validate UTF-8 in JSON
strings.
The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1,
\xF5..\xFF. This rejects some, but not all invalid UTF-8. It also
rejects ASCII control characters \x00..\x1F, in accordance with RFC
7159 (see recent commit "json: Reject unescaped control characters").
When the lexer rejects, it ends the token right after the first bad
byte. Good when the bad byte is a newline. Not so good when it's
something like an overlong sequence in the middle of a string. For
instance, input
{"abc\xC0\xAFijk": 1}\n
produces the tokens
JSON_LCURLY {
JSON_ERROR "abc\xC0
JSON_ERROR \xAF
JSON_KEYWORD ijk
JSON_ERROR ": 1}\n
The parser then reports four errors
Invalid JSON syntax
Invalid JSON syntax
JSON parse error, invalid keyword 'ijk'
Invalid JSON syntax
before it recovers at the newline.
The commit before previous made the parser reject invalid UTF-8
sequences. Since then, anything the lexer rejects, the parser would
reject as well. Thus, the lexer's rejecting is unnecessary for
correctness, and harmful for error reporting.
However, we want to keep rejecting ASCII control characters in the
lexer, because that produces the behavior we want for unclosed
strings.
We also need to keep rejecting \xFF in the lexer, because we
documented that as a way to reset the JSON parser
(docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
means we can't change how we recover from this error now. I wish we
hadn't done that.
I think we should treat \xFE the same as \xFF.
Change the lexer to accept \xC0..\xC1 and \xF5..\xFD. It now rejects
only \x00..\x1F and \xFE..\xFF. Error reporting for invalid UTF-8 in
strings is much improved, except for \xFE and \xFF. For the example
above, the lexer now produces
JSON_LCURLY {
JSON_STRING "abc\xC0\xAFijk"
JSON_COLON :
JSON_INTEGER 1
JSON_RCURLY
and the parser reports just
JSON parse error, invalid UTF-8 sequence in string
Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Eric Blake <address@hidden>
---
qobject/json-lexer.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 109a7d8bb8..ca1e0e2c03 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -177,8 +177,7 @@ static const uint8_t json_lexer[][256] = {
['u'] = IN_DQ_UCODE0,
},
[IN_DQ_STRING] = {
- [0x20 ... 0xBF] = IN_DQ_STRING,
- [0xC2 ... 0xF4] = IN_DQ_STRING,
+ [0x20 ... 0xFD] = IN_DQ_STRING,
['\\'] = IN_DQ_STRING_ESCAPE,
['"'] = JSON_STRING,
},
@@ -217,8 +216,7 @@ static const uint8_t json_lexer[][256] = {
['u'] = IN_SQ_UCODE0,
},
[IN_SQ_STRING] = {
- [0x20 ... 0xBF] = IN_SQ_STRING,
- [0xC2 ... 0xF4] = IN_SQ_STRING,
+ [0x20 ... 0xFD] = IN_SQ_STRING,
['\\'] = IN_SQ_STRING_ESCAPE,
['\''] = JSON_STRING,
},
--
2.17.1
- [Qemu-devel] [PATCH v2 06/60] test-qga: Clean up how we test QGA synchronization, (continued)
- [Qemu-devel] [PATCH v2 06/60] test-qga: Clean up how we test QGA synchronization, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 04/60] qmp-cmd-test: Split off qmp-test, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 05/60] qmp-test: Cover syntax and lexical errors, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 14/60] check-qjson qmp-test: Cover control characters more thoroughly, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 09/60] check-qjson: Cover escaped characters more thoroughly, part 2, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 42/60] json: Improve names of lexer states related to numbers, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 34/60] json: Don't pass null @tokens to json_parser_parse(), Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 23/60] json: Leave rejecting invalid UTF-8 to parser,
Markus Armbruster <=
- [Qemu-devel] [PATCH v2 30/60] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 25/60] json: Leave rejecting invalid escape sequences to parser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 38/60] json: Pass lexical errors and limit violations to callback, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 44/60] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 21/60] json: Reject invalid UTF-8 sequences, Markus Armbruster, 2018/08/17