bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {maint} yacc.c: initialize yylval in pure-parser mode


From: Akim Demaille
Subject: Re: {maint} yacc.c: initialize yylval in pure-parser mode
Date: Sat, 1 Sep 2012 11:26:26 +0200

Le 31 août 2012 à 20:04, Paul Eggert a écrit :

> On 08/31/2012 09:02 AM, Akim Demaille wrote:
>> I would very much like to have your opinion on this regard.
>> yacc.c has never initialized yylval, and this can actually
>> trigger warnings from GCC in some situations.
> 
> I have some qualms about putting in initializations just
> to pacify GCC.  But I'm afraid I don't have enough context
> to be precise about my qualms.  So I hope you don't mind if
> I ask a few questions….

Of course not, that's what I'm asking for :)

> Why aren't these warnings valid warnings?  That is, why don't
> they indicate real bugs in the generated parser, where it
> does not initialize a value that it's supposed to?

I must confess that I don't really understand how GCC computes
this, as shown in the example below.  But I do understand that
it can think that the values are used uninitialized, as it is
really the case in some, agreed, extreme situations.

Consider for instance the case of a grammar that manipulates
no semantic values at all.  In that case, yylval is, of
course, never initialized, yet the parser will maintain the
semantic value stack, and copy yylval onto it.  Since the
user cannot tell that she is not using any semantic value, this
would potentially trigger such warnings.  Actually, if GCC
were smart enough, it should even complain about yylval being
useless.

Yet, in other situations, I cannot understand why GCC complains
for instance with this yylex that does set yylval in every
situation:

> int yylex (YYSTYPE *lvalp)
> {
>   static size_t toknum = 0;
>   int res = toknum++;
>   *lvalp = toknum;
>   return res;
> }

(the "static" matters, and further simplifications kill the
following warning:

> $ gcc-mp-4.8 -Wall -O2 input.c
> input.c: In function 'yyparse':
> input.c:1503:12: warning: 'yylval' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
>    *++yyvsp = yylval;
>             ^
> input.c:1090:9: note: 'yylval' was declared here
>  YYSTYPE yylval;
>          ^
)

but it does not with this one that never sets yylval!:

> int yylex (YYSTYPE *lvalp)
> {
>   return 0;
> }

I guess it is lost in the gotos and tables of yyparse, so
I tried to strip a generated parser to get something simpler,
see the attached foo2.c (actually, I was quite surprised to see
that it was performing inter-procedural analysis here, as
it does look into yylex to see when yyparse's yylval is assigned).
But I failed to get something more understandable.

Yet...

> Does the proposed patch cause the parser to behave
> differently from before?  Is there some way that we
> can characterize those differences, so that we can
> document the new behavior?

If one writes a broken scanner that fails to assign a valid
yylval for the first token, then the difference is that
now the value is 0 instead of random.  But then, anyway,
we offer no protection against forgetting to set yylval
between tokens, so we might return the same yylval even
if the token types are completely different.

(In fact, I would really like to shorten the scope of some
of the variables of yyparse.  In particular, yylval could
be local to where yylex is called.  In that case a tool
such as valgrind/mudflap could complain about the second token
using an yylval that was not initialized, whereas with
a yylval which has yyparse as scope, it would have no
reason to complain.)

If one writes a parser that does not manipulate semantic
values, these warnings are legitimate from the point
of view of the compiler, but do not help and are not
legitimate from the point of view of the Bison user.

glr.c already initializes its yylval this way.  So really,
I see no visible change here.  The warning does annoy
some users though.  For instance:

http://diswww.mit.edu:8008/menelaus.mit.edu/krb5-bugs/12515

kerberos now uses a pragma to silence -Wmaybe-uninitialized
in the generated parser :(

There are other instances,
www.google.com/search?q=yylval+may+be+used+uninitialized

So I think we should make this change.  But I now realize
it should be documented in NEWS, so I propose the following
updated patch.

From 2a3c5d6d47b4c5f00704b808f1f2a685c8f9bdb8 Mon Sep 17 00:00:00 2001
From: Akim Demaille <address@hidden>
Date: Sat, 31 Mar 2012 10:32:10 +0200
Subject: [PATCH 1/2] yacc.c: initialize yylval for pure parsers

Tests are running without -O since
f377f69fec28013c79db4efe12bbb9d48987fb2c because some warnings (about
yylval not being initialized) show only when GCC is given -O2.
Rather, fix the warnings by initializing yylval, and run the test
suite with compiler options unmodified.

* data/yacc.c (b4_declare_scanner_communication_variables): When pure
parser, initialize yylval with zeros as is done in glr.c.
When impure, yylval might have been initialized already by the user,
do not try to override this value.
* tests/atlocal.in (O0CFLAGS, O0CXXFLAGS): Remove, use the
regular CFLAGS and CXXFLAGS.
---
 NEWS             | 15 +++++++++++++++
 data/yacc.c      | 10 ++++++++--
 tests/atlocal.in | 11 ++---------
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/NEWS b/NEWS
index 7fdbc03..8cd7d83 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,21 @@ GNU Bison NEWS
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** yylval is now initialized (yacc.c)
+
+  When generating pure parsers (%define api.pure), yylval is now
+  initialized.  Using GCC pragmas to avoid warnings such as
+
+    input.c: In function 'yyparse':
+    input.c:1503:12: warning: 'yylval' may be used uninitialized in this
+                              function [-Wmaybe-uninitialized]
+       *++yyvsp = yylval;
+                ^
+
+  is no longer needed.  This initialization is not performed for non pure
+  parsers: yylval is a global variable, whose value might be initialized by
+  the user before calling yyparse.  But in that situation, GCC is not known
+  to issue warnings anyway.
 
 * Noteworthy changes in release 2.6.2 (2012-08-03) [stable]
 
diff --git a/data/yacc.c b/data/yacc.c
index 2c57f01..0a33734 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -166,12 +166,18 @@ m4_define([b4_rhs_location],
 # ------------------------------------------
 # Declare the variables that are global, or local to YYPARSE if
 # pure-parser.
+#
+# Do not override yylval when it is global, as the user might have
+# already done so.
 m4_define([b4_declare_scanner_communication_variables], [[
 /* The lookahead symbol.  */
-int yychar;
+int yychar;]b4_pure_if([
+
+/* Default (constant) value used for initialization.  */
+static YYSTYPE yyval_default;])[
 
 /* The semantic value of the lookahead symbol.  */
-YYSTYPE yylval;]b4_locations_if([[
+YYSTYPE yylval]b4_pure_if([ = yyval_default])[;]b4_locations_if([[
 
 /* Location data for the lookahead symbol.  */
 YYLTYPE yylloc;]])b4_pure_if([], [[
diff --git a/tests/atlocal.in b/tests/atlocal.in
index d059d63..d64140f 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -29,16 +29,10 @@ CPPFLAGS="-I$abs_top_builddir/lib @CPPFLAGS@"
 # Is the compiler GCC?
 GCC='@GCC@'
 
-# We want no optimization, as they uncover warnings (therefore,
-# failures) about uninitialized variables in the test suite.  FIXME:
-# fix the warnings, not the flags.
-  O0CFLAGS=`echo '@CFLAGS@'   | sed 's/-O[0-9s] *//g'`
-O0CXXFLAGS=`echo '@CXXFLAGS@' | sed 's/-O[0-9s] *//g'`
-
 # Sometimes a test group needs to ignore gcc warnings, so it locally
 # sets CFLAGS to this.
-  NO_WERROR_CFLAGS="$O0CFLAGS   @WARN_CFLAGS@   @WARN_CFLAGS_TEST@"
-NO_WERROR_CXXFLAGS="$O0CXXFLAGS @WARN_CXXFLAGS@ @WARN_CXXFLAGS_TEST@"
+  NO_WERROR_CFLAGS='@CFLAGS@   @WARN_CFLAGS@   @WARN_CFLAGS_TEST@'
+NO_WERROR_CXXFLAGS='@CXXFLAGS@ @WARN_CXXFLAGS@ @WARN_CXXFLAGS_TEST@'
 
 # But most of the time, we want -Werror.
   CFLAGS="$NO_WERROR_CFLAGS   @WERROR_CFLAGS@"
@@ -51,7 +45,6 @@ BISON_CXX_WORKS='@BISON_CXX_WORKS@'
 if "$at_arg_compile_c_with_cxx"; then
   CC_IS_CXX=1
   CC=$CXX
-  O0CFLAGS=$O0CXXFLAGS
   NO_WERROR_CFLAGS=$NO_WERROR_CXXFLAGS
   CFLAGS=$CXXFLAGS
 else
-- 
1.7.11.5



Attachment: foo2.c
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]