bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] scan-gram.l: for flex-2.5.31: move unput call into body


From: Paul Eggert
Subject: Re: [PATCH] scan-gram.l: for flex-2.5.31: move unput call into body
Date: 26 Apr 2003 23:36:02 -0700
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Akim Demaille <address@hidden> writes:

> | -  <<EOF>>   unexpected_end_of_file (token_start, "\"");
> | +  <<EOF>>   unexpected_eof (token_start, "\""); BEGIN INITIAL;
> 
> Why don't you return the string here?  That's what I'm used to do, and
> it is effective.  Actually, I never quite understood why you use unput
> to insert the missing ending, instead of merely reseting the start
> condition, emit an error, and then return the token anyway.  This way,
> this part of the patch:
> 
> |     * tests/regression.at (Invalid inputs): Remove cascaded diagnostic
> |     that is no longer emitted after the above change.
> 
> can be reverted.

It's a long story, but you asked, so here it is....

I used 'unput' to avoid duplicate and error-prone code.  For example,
in Bison 1.75 src/scan-gram.l:

        <SC_ESCAPED_STRING>
        {
          "\"" {
            assert (yy_top_state () == INITIAL);
            YY_OBS_GROW;
            YY_OBS_FINISH;
            yylval->string = last_string;
            yy_pop_state ();
            rule_length++;
            return STRING;
          }

          [^\"\n\r\\]+      YY_OBS_GROW;

          {eols}    obstack_1grow (&string_obstack, '\n'); YY_LINES;

          <<EOF>> {
            LOCATION_PRINT (stderr, *yylloc);
            fprintf (stderr, _(": unexpected end of file in a string\n"));
            assert (yy_top_state () == INITIAL);
            YY_OBS_FINISH;
            yylval->string = last_string;
            yy_pop_state ();
            return STRING;
          }
        }

Here, the <<EOF>> case is nearly the same as the "\"" case, and subtle
bugs tend to creep in as maintainers modify one case but forget to
modify the other.  (Q. Why does the "\"" case increment rule_length,
but the <<EOF>> case does not?  A. It's probably a bug -- who knows?
Q. Why does the <<EOF>> case not append " to the string, to close it
properly?  A. Beats me.)

When I could use unput, this got quite a bit simpler.  E.g., in Bison
1.875a:

        <SC_ESCAPED_STRING>
        {
          "\"" {
            STRING_GROW;
            STRING_FINISH;
            loc->start = token_start;
            val->chars = last_string;
            rule_length++;
            BEGIN INITIAL;
            return STRING;
          }

          .|\n      STRING_GROW;
          <<EOF>>   unexpected_end_of_file (token_start, "\"");
        }

Here, the <<EOF>> case simply pushed " onto the input, which caused
the "\"" case to happen next.

Unfortunately unput is not portable as mentioned by Peter Muir in
<http://mail.gnu.org/archive/html/bison-patches/2003-04/msg00002.html>.
However, I didn't want to go back to the Bison 1.75 method due to its
error-prone duplicate code.  So insted I merely changed the <<EOF>>
case of the 1.875a method to do this:

          <<EOF>>   unexpected_eof (token_start, "\""); BEGIN INITIAL;

unexpected_eof does not unput its 2nd argument, so this version does
not pretend that the input string was correctly closed.

The main reason I did this was convenience of implementation.
However, one can argue that it's more logically correct: if the input
does not contain a complete string, the lexer shouldn't pretend that
the input contained a complete string.  Also, it's generally more
convenient to generate just one diagnostic rather than multiple
diagnostics for the same problem.

Regardless of one's position about the desirability of the change, I
don't think the changed behavior matters much to the ordinary Bison
user.  After all, if the input is wrong at EOF, who cares whether
Bison generates two diagnostics rather than one?

My sense is that you're suggesting that the code look like this instead:

        <SC_ESCAPED_STRING>
        {
          "\"" {
            STRING_GROW;
            STRING_FINISH;
            loc->start = token_start;
            val->chars = last_string;
            rule_length++;
            BEGIN INITIAL;
            return STRING;
          }

          .|\n      STRING_GROW;

          <<EOF>> {
            unexpected_eof (token_start, "\"");
            obstack_sgrow (&obstack_for_string, "\"");
            STRING_FINISH;
            loc->start = token_start;
            val->chars = last_string;
            rule_length++;
            BEGIN INITIAL;
            return STRING;
          }
        }

But this would bring back the duplication of code again, and I don't
see the advantage of this approach, given the discussion above.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]