[Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input

From:	Markus Armbruster
Subject:	[Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input"
Date:	Mon, 27 Aug 2018 09:00:17 +0200

When the lexer isn't in its start state at the end of input, it's
working on a token.  To flush it out, it needs to transit to its start
state on "end of input" lookahead.

There are two ways to the start state, depending on the current state:

* If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a
  JSON_FOO token.

* Else, it can go to IN_ERROR state, and emit a JSON_ERROR token.

There are complications, however:

* The transition to IN_ERROR state consumes the input character and
  adds it to the JSON_ERROR token.  The latter is inappropriate for
  the "end of input" character, so we suppress that.  See also recent
  commit "json: Fix lexer to include the bad character in JSON_ERROR
  token".

* The transition to a TERMINAL(JSON_FOO) state doesn't consume the
  input character.  In that case, the lexer normally loops until it is
  consumed.  We have to suppress that for the "end of input" input
  character.  If we didn't, the lexer would consume it by entering
  IN_ERROR state, emitting a bogus JSON_ERROR token.  We fixed that in
  commit bd3924a33a6.

However, simply breaking the loop this way assumes that the lexer
needs exactly one state transition to reach its start state.  That
assumption is correct now, but it's unclean, and I'll soon break it.
Clean up: instead of breaking the loop after one iteration, break it
after it reached the start state.

Signed-off-by: Markus Armbruster <address@hidden>
---
 qobject/json-lexer.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 4867839f66..ec3aec726f 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -261,7 +261,8 @@ void json_lexer_init(JSONLexer *lexer, bool 
enable_interpolation)
 
 static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
 {
-    int char_consumed, new_state;
+    int new_state;
+    bool char_consumed = false;
 
     lexer->x++;
     if (ch == '\n') {
@@ -269,11 +270,12 @@ static void json_lexer_feed_char(JSONLexer *lexer, char 
ch, bool flush)
         lexer->y++;
     }
 
-    do {
+    while (flush ? lexer->state != lexer->start_state : !char_consumed) {
         assert(lexer->state <= ARRAY_SIZE(json_lexer));
         new_state = json_lexer[lexer->state][(uint8_t)ch];
-        char_consumed = !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
-        if (char_consumed && !flush) {
+        char_consumed = !flush
+            && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
+        if (char_consumed) {
             g_string_append_c(lexer->token, ch);
         }
 
@@ -318,7 +320,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, 
bool flush)
             break;
         }
         lexer->state = new_state;
-    } while (!char_consumed && !flush);
+    }
 
     /* Do not let a single token grow to an arbitrarily large size,
      * this is a security consideration.
@@ -342,9 +344,8 @@ void json_lexer_feed(JSONLexer *lexer, const char *buffer, 
size_t size)
 
 void json_lexer_flush(JSONLexer *lexer)
 {
-    if (lexer->state != lexer->start_state) {
-        json_lexer_feed_char(lexer, 0, true);
-    }
+    json_lexer_feed_char(lexer, 0, true);
+    assert(lexer->state == lexer->start_state);
     json_message_process_token(lexer, lexer->token, JSON_END_OF_INPUT,
                                lexer->x, lexer->y);
 }
-- 
2.17.1

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/6] json: More fixes, error reporting improvements, cleanups, Markus Armbruster, 2018/08/27
- [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F', Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Markus Armbruster <=
  - Re: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input", Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing, Eric Blake, 2018/08/27
- [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Eric Blake, 2018/08/27
    - Re: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors, Markus Armbruster, 2018/08/28
- [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Markus Armbruster, 2018/08/27
  - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/27
  - Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR, Eric Blake, 2018/08/27

Prev by Date: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F'
Next by Date: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing
Previous by thread: Re: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F'
Next by thread: Re: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input"
Index(es):
- Date
- Thread