help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bison-3.0 released [stable]


From: Akim Demaille
Subject: bison-3.0 released [stable]
Date: Thu, 25 Jul 2013 18:29:22 +0200

The Bison team is very happy to announce the release of Bison 3.0, which
introduces many new features.  An executive summary would include: (i) deep
overhaul/improvements of the diagnostics, (ii) more versatile means to
describe semantic value types (including the ability to store genuine C++
objects in C++ parsers), (iii) push-parser interface extended to Java, and
(iv) parse-time semantic predicates for GLR parsers.

Here are the compressed sources:
  ftp://ftp.gnu.org/gnu/bison/bison-3.0.tar.gz   (3.1MB)
  ftp://ftp.gnu.org/gnu/bison/bison-3.0.tar.xz   (1.8MB)

Here are the GPG detached signatures[*]:
  ftp://ftp.gnu.org/gnu/bison/bison-3.0.tar.gz.sig
  ftp://ftp.gnu.org/gnu/bison/bison-3.0.tar.xz.sig

Use a mirror for higher download bandwidth:
  http://www.gnu.org/order/ftp.html

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify bison-3.0.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 0DDCAA3278D5264E

and rerun the 'gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.69
  Automake 1.14
  Flex 2.5.37
  Gettext 0.18.3
  Gnulib v0.0-7982-g03e96cc

NEWS

* Noteworthy changes in release 3.0 (2013-07-25) [stable]

** WARNING: Future backward-incompatibilities!

  Like other GNU packages, Bison will start using some of the C99 features
  for its own code, especially the definition of variables after statements.
  The generated C parsers still aim at C90.

** Backward incompatible changes

*** Obsolete features

  Support for YYFAIL is removed (deprecated in Bison 2.4.2): use YYERROR.

  Support for yystype and yyltype is removed (deprecated in Bison 1.875):
  use YYSTYPE and YYLTYPE.

  Support for YYLEX_PARAM and YYPARSE_PARAM is removed (deprecated in Bison
  1.875): use %lex-param, %parse-param, or %param.

  Missing semicolons at the end of actions are no longer added (as announced
  in the release 2.5).

*** Use of YACC='bison -y'

  TL;DR: With Autoconf <= 2.69, pass -Wno-yacc to (AM_)YFLAGS if you use
  Bison extensions.

  Traditional Yacc generates 'y.tab.c' whatever the name of the input file.
  Therefore Makefiles written for Yacc expect 'y.tab.c' (and possibly
  'y.tab.h' and 'y.outout') to be generated from 'foo.y'.

  To this end, for ages, AC_PROG_YACC, Autoconf's macro to look for an
  implementation of Yacc, was using Bison as 'bison -y'.  While it does
  ensure compatible output file names, it also enables warnings for
  incompatibilities with POSIX Yacc.  In other words, 'bison -y' triggers
  warnings for Bison extensions.

  Autoconf 2.70+ fixes this incompatibility by using YACC='bison -o y.tab.c'
  (which also generates 'y.tab.h' and 'y.output' when needed).
  Alternatively, disable Yacc warnings by passing '-Wno-yacc' to your Yacc
  flags (YFLAGS, or AM_YFLAGS with Automake).

** Bug fixes

*** The epilogue is no longer affected by internal #defines (glr.c)

  The glr.c skeleton uses defines such as #define yylval (yystackp->yyval) in
  generated code.  These weren't properly undefined before the inclusion of
  the user epilogue, so functions such as the following were butchered by the
  preprocessor expansion:

    int yylex (YYSTYPE *yylval);

  This is fixed: yylval, yynerrs, yychar, and yylloc are now valid
  identifiers for user-provided variables.

*** stdio.h is no longer needed when locations are enabled (yacc.c)

  Changes in Bison 2.7 introduced a dependency on FILE and fprintf when
  locations are enabled.  This is fixed.

*** Warnings about useless %pure-parser/%define api.pure are restored

** Diagnostics reported by Bison

  Most of these features were contributed by Théophile Ranquet and Victor
  Santet.

*** Carets

  Version 2.7 introduced caret errors, for a prettier output.  These are now
  activated by default.  The old format can still be used by invoking Bison
  with -fno-caret (or -fnone).

  Some error messages that reproduced excerpts of the grammar are now using
  the caret information only.  For instance on:

    %%
    exp: 'a' | 'a';

  Bison 2.7 reports:

    in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
    in.y:2.12-14: warning: rule useless in parser due to conflicts: exp: 'a' 
[-Wother]

  Now bison reports:

    in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
    in.y:2.12-14: warning: rule useless in parser due to conflicts [-Wother]
     exp: 'a' | 'a';
                ^^^

  and "bison -fno-caret" reports:

    in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
    in.y:2.12-14: warning: rule useless in parser due to conflicts [-Wother]

*** Enhancements of the -Werror option

  The -Werror=CATEGORY option is now recognized, and will treat specified
  warnings as errors. The warnings need not have been explicitly activated
  using the -W option, this is similar to what GCC 4.7 does.

  For example, given the following command line, Bison will treat both
  warnings related to POSIX Yacc incompatibilities and S/R conflicts as
  errors (and only those):

    $ bison -Werror=yacc,error=conflicts-sr input.y

  If no categories are specified, -Werror will make all active warnings into
  errors. For example, the following line does the same the previous example:

    $ bison -Werror -Wnone -Wyacc -Wconflicts-sr input.y

  (By default -Wconflicts-sr,conflicts-rr,deprecated,other is enabled.)

  Note that the categories in this -Werror option may not be prefixed with
  "no-". However, -Wno-error[=CATEGORY] is valid.

  Note that -y enables -Werror=yacc. Therefore it is now possible to require
  Yacc-like behavior (e.g., always generate y.tab.c), but to report
  incompatibilities as warnings: "-y -Wno-error=yacc".

*** The display of warnings is now richer

  The option that controls a given warning is now displayed:

    foo.y:4.6: warning: type clash on default action: <foo> != <bar> [-Wother]

  In the case of warnings treated as errors, the prefix is changed from
  "warning: " to "error: ", and the suffix is displayed, in a manner similar
  to GCC, as [-Werror=CATEGORY].

  For instance, where the previous version of Bison would report (and exit
  with failure):

    bison: warnings being treated as errors
    input.y:1.1: warning: stray ',' treated as white space

  it now reports:

    input.y:1.1: error: stray ',' treated as white space [-Werror=other]

*** Deprecated constructs

  The new 'deprecated' warning category flags obsolete constructs whose
  support will be discontinued.  It is enabled by default.  These warnings
  used to be reported as 'other' warnings.

*** Useless semantic types

  Bison now warns about useless (uninhabited) semantic types.  Since
  semantic types are not declared to Bison (they are defined in the opaque
  %union structure), it is %printer/%destructor directives about useless
  types that trigger the warning:

    %token <type1> term
    %type  <type2> nterm
    %printer    {} <type1> <type3>
    %destructor {} <type2> <type4>
    %%
    nterm: term { $$ = $1; };

    3.28-34: warning: type <type3> is used, but is not associated to any symbol
    4.28-34: warning: type <type4> is used, but is not associated to any symbol

*** Undefined but unused symbols

  Bison used to raise an error for undefined symbols that are not used in
  the grammar.  This is now only a warning.

    %printer    {} symbol1
    %destructor {} symbol2
    %type <type>   symbol3
    %%
    exp: "a";

*** Useless destructors or printers

  Bison now warns about useless destructors or printers.  In the following
  example, the printer for <type1>, and the destructor for <type2> are
  useless: all symbols of <type1> (token1) already have a printer, and all
  symbols of type <type2> (token2) already have a destructor.

    %token <type1> token1
           <type2> token2
           <type3> token3
           <type4> token4
    %printer    {} token1 <type1> <type3>
    %destructor {} token2 <type2> <type4>

*** Conflicts

  The warnings and error messages about shift/reduce and reduce/reduce
  conflicts have been normalized.  For instance on the following foo.y file:

    %glr-parser
    %%
    exp: exp '+' exp | '0' | '0';

  compare the previous version of bison:

    $ bison foo.y
    foo.y: conflicts: 1 shift/reduce, 2 reduce/reduce
    $ bison -Werror foo.y
    bison: warnings being treated as errors
    foo.y: conflicts: 1 shift/reduce, 2 reduce/reduce

  with the new behavior:

    $ bison foo.y
    foo.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
    foo.y: warning: 2 reduce/reduce conflicts [-Wconflicts-rr]
    $ bison -Werror foo.y
    foo.y: error: 1 shift/reduce conflict [-Werror=conflicts-sr]
    foo.y: error: 2 reduce/reduce conflicts [-Werror=conflicts-rr]

  When %expect or %expect-rr is used, such as with bar.y:

    %expect 0
    %glr-parser
    %%
    exp: exp '+' exp | '0' | '0';

  Former behavior:

    $ bison bar.y
    bar.y: conflicts: 1 shift/reduce, 2 reduce/reduce
    bar.y: expected 0 shift/reduce conflicts
    bar.y: expected 0 reduce/reduce conflicts

  New one:

    $ bison bar.y
    bar.y: error: shift/reduce conflicts: 1 found, 0 expected
    bar.y: error: reduce/reduce conflicts: 2 found, 0 expected

** Incompatibilities with POSIX Yacc

  The 'yacc' category is no longer part of '-Wall', enable it explicitly
  with '-Wyacc'.

** Additional yylex/yyparse arguments

  The new directive %param declares additional arguments to both yylex and
  yyparse.  The %lex-param, %parse-param, and %param directives support one
  or more arguments.  Instead of

    %lex-param   {arg1_type *arg1}
    %lex-param   {arg2_type *arg2}
    %parse-param {arg1_type *arg1}
    %parse-param {arg2_type *arg2}

  one may now declare

    %param {arg1_type *arg1} {arg2_type *arg2}

** Types of values for %define variables

  Bison used to make no difference between '%define foo bar' and '%define
  foo "bar"'.  The former is now called a 'keyword value', and the latter a
  'string value'.  A third kind was added: 'code values', such as '%define
  foo {bar}'.

  Keyword variables are used for fixed value sets, e.g.,

    %define lr.type lalr

  Code variables are used for value in the target language, e.g.,

    %define api.value.type {struct semantic_type}

  String variables are used remaining cases, e.g. file names.

** Variable api.token.prefix

  The variable api.token.prefix changes the way tokens are identified in
  the generated files.  This is especially useful to avoid collisions
  with identifiers in the target language.  For instance

    %token FILE for ERROR
    %define api.token.prefix {TOK_}
    %%
    start: FILE for ERROR;

  will generate the definition of the symbols TOK_FILE, TOK_for, and
  TOK_ERROR in the generated sources.  In particular, the scanner must
  use these prefixed token names, although the grammar itself still
  uses the short names (as in the sample rule given above).

** Variable api.value.type

  This new %define variable supersedes the #define macro YYSTYPE.  The use
  of YYSTYPE is discouraged.  In particular, #defining YYSTYPE *and* either
  using %union or %defining api.value.type results in undefined behavior.

  Either define api.value.type, or use "%union":

    %union
    {
      int ival;
      char *sval;
    }
    %token <ival> INT "integer"
    %token <sval> STRING "string"
    %printer { fprintf (yyo, "%d", $$); } <ival>
    %destructor { free ($$); } <sval>

    /* In yylex().  */
    yylval.ival = 42; return INT;
    yylval.sval = "42"; return STRING;

  The %define variable api.value.type supports both keyword and code values.

  The keyword value 'union' means that the user provides genuine types, not
  union member names such as "ival" and "sval" above (WARNING: will fail if
  -y/--yacc/%yacc is enabled).

    %define api.value.type union
    %token <int> INT "integer"
    %token <char *> STRING "string"
    %printer { fprintf (yyo, "%d", $$); } <int>
    %destructor { free ($$); } <char *>

    /* In yylex().  */
    yylval.INT = 42; return INT;
    yylval.STRING = "42"; return STRING;

  The keyword value variant is somewhat equivalent, but for C++ special
  provision is made to allow classes to be used (more about this below).

    %define api.value.type variant
    %token <int> INT "integer"
    %token <std::string> STRING "string"

  Code values (in braces) denote user defined types.  This is where YYSTYPE
  used to be used.

    %code requires
    {
      struct my_value
      {
        enum
        {
          is_int, is_string
        } kind;
        union
        {
          int ival;
          char *sval;
        } u;
      };
    }
    %define api.value.type {struct my_value}
    %token <u.ival> INT "integer"
    %token <u.sval> STRING "string"
    %printer { fprintf (yyo, "%d", $$); } <u.ival>
    %destructor { free ($$); } <u.sval>

    /* In yylex().  */
    yylval.u.ival = 42; return INT;
    yylval.u.sval = "42"; return STRING;

** Variable parse.error

  This variable controls the verbosity of error messages.  The use of the
  %error-verbose directive is deprecated in favor of "%define parse.error
  verbose".

** Renamed %define variables

  The following variables have been renamed for consistency.  Backward
  compatibility is ensured, but upgrading is recommended.

    lr.default-reductions      -> lr.default-reduction
    lr.keep-unreachable-states -> lr.keep-unreachable-state
    namespace                  -> api.namespace
    stype                      -> api.value.type

** Semantic predicates

  Contributed by Paul Hilfinger.

  The new, experimental, semantic-predicate feature allows actions of the
  form "%?{ BOOLEAN-EXPRESSION }", which cause syntax errors (as for
  YYERROR) if the expression evaluates to 0, and are evaluated immediately
  in GLR parsers, rather than being deferred.  The result is that they allow
  the programmer to prune possible parses based on the values of run-time
  expressions.

** The directive %expect-rr is now an error in non GLR mode

  It used to be an error only if used in non GLR mode, _and_ if there are
  reduce/reduce conflicts.

** Tokens are numbered in their order of appearance

  Contributed by Valentin Tolmer.

  With '%token A B', A had a number less than the one of B.  However,
  precedence declarations used to generate a reversed order.  This is now
  fixed, and introducing tokens with any of %token, %left, %right,
  %precedence, or %nonassoc yields the same result.

  When mixing declarations of tokens with a litteral character (e.g., 'a')
  or with an identifier (e.g., B) in a precedence declaration, Bison
  numbered the litteral characters first.  For example

    %right A B 'c' 'd'

  would lead to the tokens declared in this order: 'c' 'd' A B.  Again, the
  input order is now preserved.

  These changes were made so that one can remove useless precedence and
  associativity declarations (i.e., map %nonassoc, %left or %right to
  %precedence, or to %token) and get exactly the same output.

** Useless precedence and associativity

  Contributed by Valentin Tolmer.

  When developing and maintaining a grammar, useless associativity and
  precedence directives are common.  They can be a nuisance: new ambiguities
  arising are sometimes masked because their conflicts are resolved due to
  the extra precedence or associativity information.  Furthermore, it can
  hinder the comprehension of a new grammar: one will wonder about the role
  of a precedence, where in fact it is useless.  The following changes aim
  at detecting and reporting these extra directives.

*** Precedence warning category

  A new category of warning, -Wprecedence, was introduced. It flags the
  useless precedence and associativity directives.

*** Useless associativity

  Bison now warns about symbols with a declared associativity that is never
  used to resolve conflicts.  In that case, using %precedence is sufficient;
  the parsing tables will remain unchanged.  Solving these warnings may raise
  useless precedence warnings, as the symbols no longer have associativity.
  For example:

    %left '+'
    %left '*'
    %%
    exp:
      "number"
    | exp '+' "number"
    | exp '*' exp
    ;

  will produce a

    warning: useless associativity for '+', use %precedence [-Wprecedence]
     %left '+'
           ^^^

*** Useless precedence

  Bison now warns about symbols with a declared precedence and no declared
  associativity (i.e., declared with %precedence), and whose precedence is
  never used.  In that case, the symbol can be safely declared with %token
  instead, without modifying the parsing tables.  For example:

    %precedence '='
    %%
    exp: "var" '=' "number";

  will produce a

    warning: useless precedence for '=' [-Wprecedence]
     %precedence '='
                 ^^^

*** Useless precedence and associativity

  In case of both useless precedence and associativity, the issue is flagged
  as follows:

    %nonassoc '='
    %%
    exp: "var" '=' "number";

  The warning is:

    warning: useless precedence and associativity for '=' [-Wprecedence]
     %nonassoc '='
               ^^^

** Empty rules

  With help from Joel E. Denny and Gabriel Rassoul.

  Empty rules (i.e., with an empty right-hand side) can now be explicitly
  marked by the new %empty directive.  Using %empty on a non-empty rule is
  an error.  The new -Wempty-rule warning reports empty rules without
  %empty.  On the following grammar:

    %%
    s: a b c;
    a: ;
    b: %empty;
    c: 'a' %empty;

  bison reports:

    3.4-5: warning: empty rule without %empty [-Wempty-rule]
     a: {}
        ^^
    5.8-13: error: %empty on non-empty rule
     c: 'a' %empty {};
            ^^^^^^

** Java skeleton improvements

  The constants for token names were moved to the Lexer interface.  Also, it
  is possible to add code to the parser's constructors using "%code init"
  and "%define init_throws".
  Contributed by Paolo Bonzini.

  The Java skeleton now supports push parsing.
  Contributed by Dennis Heimbigner.

** C++ skeletons improvements

*** The parser header is no longer mandatory (lalr1.cc, glr.cc)

  Using %defines is now optional.  Without it, the needed support classes
  are defined in the generated parser, instead of additional files (such as
  location.hh, position.hh and stack.hh).

*** Locations are no longer mandatory (lalr1.cc, glr.cc)

  Both lalr1.cc and glr.cc no longer require %location.

*** syntax_error exception (lalr1.cc)

  The C++ parser features a syntax_error exception, which can be
  thrown from the scanner or from user rules to raise syntax errors.
  This facilitates reporting errors caught in sub-functions (e.g.,
  rejecting too large integral literals from a conversion function
  used by the scanner, or rejecting invalid combinations from a
  factory invoked by the user actions).

*** %define api.value.type variant

  This is based on a submission from Michiel De Wilde.  With help
  from Théophile Ranquet.

  In this mode, complex C++ objects can be used as semantic values.  For
  instance:

    %token <::std::string> TEXT;
    %token <int> NUMBER;
    %token SEMICOLON ";"
    %type <::std::string> item;
    %type <::std::list<std::string>> list;
    %%
    result:
      list  { std::cout << $1 << std::endl; }
    ;

    list:
      %empty        { /* Generates an empty string list. */ }
    | list item ";" { std::swap ($$, $1); $$.push_back ($2); }
    ;

    item:
      TEXT    { std::swap ($$, $1); }
    | NUMBER  { $$ = string_cast ($1); }
    ;

*** %define api.token.constructor

  When variants are enabled, Bison can generate functions to build the
  tokens.  This guarantees that the token type (e.g., NUMBER) is consistent
  with the semantic value (e.g., int):

    parser::symbol_type yylex ()
    {
      parser::location_type loc = ...;
      ...
      return parser::make_TEXT ("Hello, world!", loc);
      ...
      return parser::make_NUMBER (42, loc);
      ...
      return parser::make_SEMICOLON (loc);
      ...
    }

*** C++ locations

  There are operator- and operator-= for 'location'.  Negative line/column
  increments can no longer underflow the resulting value.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]