[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: %destructor can leak the lookahead
From: |
Akim Demaille |
Subject: |
Re: %destructor can leak the lookahead |
Date: |
Fri, 03 Sep 2004 10:58:03 +0200 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
>>> "Paul" == Paul Eggert <address@hidden> writes:
> Akim Demaille <address@hidden> writes:
>> There is one case where %destructor is not triggered, but it should:
>> when the error recovery just aborts when we were popping the stack and
>> we reach its bottom.
> Can you please explain this leak a bit more? I just looked at yacc.c
> and found these snippets:
> YYPOPSTACK;
> if (yyssp == yyss)
> YYABORT;
> YYDSYMPRINTF ("Error: popping", yystos[*yyssp], yyvsp, yylsp);
> yydestruct (yystos[*yyssp], yyvsp]b4_location_if([, yylsp])[);
> ...
> if (yyssp == yyss)
> YYABORT;
> YYDSYMPRINTF ("Error: popping", yystos[*yyssp], yyvsp, yylsp);
> yydestruct (yystos[yystate], yyvsp]b4_location_if([, yylsp])[);
Once again I forgot to repeat important information provided only in
the subject: I'm referring to the lookahead. Just before the YYABORT
you quote (which corresponds to the case where we popped the whole
stack without finding a place to handle the error token) there should
be an yydestruct of the lookahead.
> Is it that we need to destroy the semantic value of the error token
> itself, before executing the first code snippet quoted above?
Hm, I never thought about this possibility. Can you really pass some
semantic value to error? Gee. This token is severely "overloaded",
how can you type it...
> I'm a bit rusty with this code, but I looked for other possible
> places where %destructor should be triggered but isn't and found these:
> * the stack overflows, and the user-defined yyoverflow macro refuses
> to extend the stack. This does a YYABORT without invoking destructors.
You're right, that should be handled too.
> * The user code invokes YYERROR. This pops "yylen" items without
> destroying them.
Definitely.
Maybe we should also free the stack and lookahead (but not $$) when
YYACCEPT is called.
>> One way to fix this leak is to just add a destructor call right
>> there. Another would be to put this destructor call into yyabortlab.
>> That would imply:
>>
>> - yydestruct can be called on EOF
>> So what? It should not be defined, and if the user meant to define
>> it, there could be good reasons.
> Sorry, I don't follow. Why should yydestruct not be defined here?
yydestruct will sure be defined, and, unless some explicit
"%destructor {} EOF" is given, nothing will happen.
>> - yydestruct is called on the lookahead if the user explicitly calls
>> YYABORT.
>>
>> Because of this last reason, I'm tempted to release the lookahead at
>> yyerrorlab.
> Sorry, I still don't follow. Perhaps I'm misunderstanding the meaning
> of destructors in general. From the documentation, destructors are
> called for all symbols that are "discarded by the parser". But an
> ordinary reduce operation discards symbols, right? So shouldn't
> destructors be invoked every time the stack shrinks, no matter why it
> shrinks?
In the case of a reduction, it is completely different: the user is
given the possibility to deal with the memory all by herself. If she
wants to free, then she can. If she wants to aggregate some $n into a
bigger $$, then be it. We should not interfere with this [1].
But error recovery is not under user control. She cannot reclaim the
memory we hid from her. Whence %destructor.
[1] There is one other possibility where I'm considering calling
yydestruct: when a rhs symbol is not mentioned in the action. But I
see this is already reported in the documentation.
Here is my proposal to handle the sole case of the lookahead. Also, I
think we should factor YY_SYMBOL_PRINT calls into yydestruct itself.
Index: ChangeLog
from Akim Demaille <address@hidden>
* data/glr.c, data/lalr1.cc, data/yacc.c: When YYABORT was
invoked, yydestruct the lookahead.
* tests/calc.at (Calculator $1): Update the expected lengths of
traces: there is an added line for the discarded lookahead.
* doc/bison.texinfo (Destructor Decl): Some rewording.
Define "discarded" symbols.
Index: data/glr.c
===================================================================
RCS file: /cvsroot/bison/bison/data/glr.c,v
retrieving revision 1.72
diff -u -u -r1.72 glr.c
--- data/glr.c 2 Sep 2004 14:27:02 -0000 1.72
+++ data/glr.c 3 Sep 2004 08:57:48 -0000
@@ -1916,7 +1916,13 @@
yyposn = yystack.yytops.yystates[0]->yyposn;
}
yyDone:
- ;
+ /* On YYABORT, free the lookahead. */
+ if (yystack.yyerrflag == 1 && yytoken != YYEMPTY)
+ {
+ YY_SYMBOL_PRINT ("Error: discarding lookahead",
+ yytoken, yylvalp, yyllocp);
+ yydestruct (yytoken, yylvalp]b4_location_if([, yyllocp])[);
+ }
yyfreeGLRStack (&yystack);
return yystack.yyerrflag;
Index: data/lalr1.cc
===================================================================
RCS file: /cvsroot/bison/bison/data/lalr1.cc,v
retrieving revision 1.50
diff -u -u -r1.50 lalr1.cc
--- data/lalr1.cc 2 Sep 2004 14:30:55 -0000 1.50
+++ data/lalr1.cc 3 Sep 2004 08:57:48 -0000
@@ -714,6 +714,10 @@
/* Abort. */
yyabortlab:
+ /* Free the lookahead. */
+ YY_SYMBOL_PRINT ("Error: discarding lookahead", ilooka_, &value, &location);
+ destruct_ (ilooka_, &value, &location);
+ looka_ = empty_;
return 1;
}
Index: data/yacc.c
===================================================================
RCS file: /cvsroot/bison/bison/data/yacc.c,v
retrieving revision 1.65
diff -u -u -r1.65 yacc.c
--- data/yacc.c 2 Sep 2004 14:27:02 -0000 1.65
+++ data/yacc.c 3 Sep 2004 08:57:48 -0000
@@ -1209,6 +1209,9 @@
| yyabortlab -- YYABORT comes here. |
`-----------------------------------*/
yyabortlab:
+ YY_SYMBOL_PRINT ("Error: discarding lookahead", yytoken, &yylval, &yylloc);
+ yydestruct (yytoken, &yylval]b4_location_if([, &yylloc])[);
+ yychar = YYEMPTY;
yyresult = 1;
goto yyreturn;
Index: doc/bison.texinfo
===================================================================
RCS file: /cvsroot/bison/bison/doc/bison.texinfo,v
retrieving revision 1.129
diff -u -u -r1.129 bison.texinfo
--- doc/bison.texinfo 26 Aug 2004 13:05:41 -0000 1.129
+++ doc/bison.texinfo 3 Sep 2004 08:57:49 -0000
@@ -787,7 +787,7 @@
value of @samp{a} from the outer scope. So this approach cannot
work.
-A simple solution to this problem is to declare the parser to
+A simple solution to this problem is to declare the parser to
use the @acronym{GLR} algorithm.
When the @acronym{GLR} parser reaches the critical state, it
merely splits into two branches and pursues both syntax rules
@@ -871,7 +871,7 @@
The parser can be turned into a @acronym{GLR} parser, while also telling Bison
to be silent about the one known reduce/reduce conflict, by
-adding these two declarations to the Bison input file (before the first
+adding these two declarations to the Bison input file (before the first
@samp{%%}):
@example
@@ -893,7 +893,7 @@
intended. A @acronym{GLR} parser splitting inadvertently may cause
problems less obvious than an @acronym{LALR} parser statically choosing the
wrong alternative in a conflict.
-Second, consider interactions with the lexer (@pxref{Semantic Tokens})
+Second, consider interactions with the lexer (@pxref{Semantic Tokens})
with great care. Since a split parser consumes tokens
without performing any actions during the split, the lexer cannot
obtain information via parser actions. Some cases of
@@ -977,20 +977,20 @@
@samp{x} as an @code{ID}).
Bison detects this as a reduce/reduce conflict between the rules
@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
-time it encounters @code{x} in the example above. Since this is a
address@hidden parser, it therefore splits the problem into two parses, one for
+time it encounters @code{x} in the example above. Since this is a
address@hidden parser, it therefore splits the problem into two parses, one for
each choice of resolving the reduce/reduce conflict.
Unlike the example from the previous section (@pxref{Simple GLR Parsers}),
however, neither of these parses ``dies,'' because the grammar as it stands is
-ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
-the other reduces @code{stmt : decl}, after which both parsers are in an
-identical state: they've seen @samp{prog stmt} and have the same unprocessed
-input remaining. We say that these parses have @dfn{merged.}
+ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
+the other reduces @code{stmt : decl}, after which both parsers are in an
+identical state: they've seen @samp{prog stmt} and have the same unprocessed
+input remaining. We say that these parses have @dfn{merged.}
At this point, the @acronym{GLR} parser requires a specification in the
grammar of how to choose between the competing parses.
In the example above, the two @code{%dprec}
-declarations specify that Bison is to give precedence
+declarations specify that Bison is to give precedence
to the parse that interprets the example as a
@code{decl}, which implies that @code{x} is a declarator.
The parser therefore prints
@@ -1007,7 +1007,7 @@
@end example
@noindent
-This is another example of using @acronym{GLR} to parse an unambiguous
+This is another example of using @acronym{GLR} to parse an unambiguous
construct, as shown in the previous section (@pxref{Simple GLR Parsers}).
Here, there is no ambiguity (this cannot be parsed as a declaration).
However, at the time the Bison parser encounters @code{x}, it does not
@@ -1066,7 +1066,7 @@
@end example
Bison requires that all of the
-productions that participate in any particular merge have identical
+productions that participate in any particular merge have identical
@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable,
and the parser will report an error during any parse that results in
the offending merge.
@@ -3734,14 +3734,13 @@
@cindex freeing discarded symbols
@findex %destructor
-Some symbols can be discarded by the parser, typically during error
-recovery (@pxref{Error Recovery}). Basically, during error recovery,
-embarrassing symbols already pushed on the stack, and embarrassing
-tokens coming from the rest of the file are thrown away until the parser
-falls on its feet. If these symbols convey heap based information, this
-memory is lost. While this behavior is tolerable for batch parsers,
-such as in compilers, it is unacceptable for parsers that can
-possibility ``never end'' such as shells, or implementations of
+Some symbols can be discarded by the parser. For instance, during error
+recovery (@pxref{Error Recovery}), embarrassing symbols already pushed
+on the stack, and embarrassing tokens coming from the rest of the file
+are thrown away until the parser falls on its feet. If these symbols
+convey heap based information, this memory is lost. While this behavior
+can be tolerable for batch parsers, such as in compilers, it is not for
+possibly ``never ending'' parsers such as shells, or implementations of
communication protocols.
The @code{%destructor} directive allows for the definition of code that
@@ -3794,6 +3793,22 @@
typefull: string; // $$ = $1 applies, $1 is not destroyed.
@end smallexample
address@hidden 1
+
address@hidden discarded symbols
address@hidden symbols} are the following:
+
address@hidden
address@hidden
+stacked symbols popped during the first phase of error recovery,
address@hidden
+incoming terminals during the second phase of error recovery,
address@hidden
+the current lookahead when the parser aborts (either via an explicit
+call to @code{YYABORT}, or as a consequence of a failed error recovery).
address@hidden itemize
+
+
@node Expect Decl
@subsection Suppressing Conflict Warnings
@cindex suppressing conflict warnings
Index: tests/calc.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/calc.at,v
retrieving revision 1.63
diff -u -u -r1.63 calc.at
--- tests/calc.at 21 Jun 2004 20:20:31 -0000 1.63
+++ tests/calc.at 3 Sep 2004 08:57:49 -0000
@@ -466,21 +466,21 @@
[486])
# Some syntax errors.
-_AT_CHECK_CALC_ERROR([$1], [1], [0 0], [11],
+_AT_CHECK_CALC_ERROR([$1], [1], [0 0], [12],
[1.2: syntax error, unexpected "number"])
-_AT_CHECK_CALC_ERROR([$1], [1], [1//2], [15],
+_AT_CHECK_CALC_ERROR([$1], [1], [1//2], [16],
[1.2: syntax error, unexpected '/', expecting "number" or
'-' or '(' or '!'])
-_AT_CHECK_CALC_ERROR([$1], [1], [error], [4],
+_AT_CHECK_CALC_ERROR([$1], [1], [error], [5],
[1.0: syntax error, unexpected $undefined])
-_AT_CHECK_CALC_ERROR([$1], [1], [1 = 2 = 3], [22],
+_AT_CHECK_CALC_ERROR([$1], [1], [1 = 2 = 3], [23],
[1.6: syntax error, unexpected '='])
_AT_CHECK_CALC_ERROR([$1], [1],
[
+1],
- [14],
+ [15],
[2.0: syntax error, unexpected '+'])
# Exercise error messages with EOF: work on an empty file.
-_AT_CHECK_CALC_ERROR([$1], [1], [/dev/null], [4],
+_AT_CHECK_CALC_ERROR([$1], [1], [/dev/null], [5],
[1.0: syntax error, unexpected "end of input"])
# Exercise the error token: without it, we die at the first error,