[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR
Date: Fri, 31 Aug 2018 09:06:03 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 08/27/2018 11:40 PM, Markus Armbruster wrote:
>>>>    typedef enum json_token_type {
>>>> -    JSON_MIN = 100,
>>>> +    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
>>>> +    /* Gap for lexer states */
>>>> +    JSON_LCURLY = 100,
>>> In an earlier version of this type of cleanup, you swapped the IN_ and
>>> JSON_ values and eliminated the gap, to make the overall table more
>>> compact (no storage wasted on any of the states in the gap between the
>>> two).
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html
>>> Is it still worth trying to minimize the gap between the two
>>> sequences, even if you now no longer swap them in order?
>> You caught me :)
>> Eliminating the gap actually enlarges the table.
> Rather, switching the order enlarges the table.
>>  I first got confused,
>> then measured the size change backwards to confirm my confused ideas.
>> When I looked at the patch again, I realized my mistake, and silently
>> dropped this part of the change.
> The size of the table is determined by the fact that we must
> initialize entry 0 (whether we spell it IN_ERROR or JSON_ERROR), then
> pay attention to the largest value assigned.  Re-reading json_lexer[],
> you are only initializing IN_* states, and not JSON_* states;


The JSON_* states other than JSON_ERROR all go to the start state
regardless of lookahead and without consuming it.  We implement that
state transition in code instead of putting it into the table:

        case JSON_STRING:
            json_message_process_token(lexer, lexer->token, new_state,
                                       lexer->x, lexer->y);
            /* fall through */
        case JSON_SKIP:
            g_string_truncate(lexer->token, 0);
            /* fall through */
        case IN_START:
-->         new_state = lexer->start_state;

JSON_ERROR goes to IN_RECOVERY instead:

        case JSON_ERROR:
            json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                       lexer->x, lexer->y);
--->        new_state = IN_RECOVERY;
            /* fall through */
        case IN_RECOVERY:

>                                                               swapping
> JSON_* to come first enlarged the table because you now have a bunch
> of additional rows in the table that are all 0-initialized to
> JSON_ERROR transitions.

Yes.  These rows are never used.

> So at the end of the day, leaving IN_* to be first, and putting JSON_*
> second, makes sense.
> The question remains, then, if a fixed-size gap (by making JSON_MIN be
> exactly 100) is any smarter than a contiguous layout (by making
> JSON_MIN be IN_START_INTERP + 1).  I can't see any strong reason for
> preferring one form over the other, so keeping the gap doesn't hurt.

The gap lets us hide the IN_* in json-lexer.c.  Not sure it's worth the
trouble.  We could move the IN_* to json-parser-int.h for simplicity.
Not sure that's worth the trouble, either :)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]