[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bison 3.5.91 released [beta]
From: |
Akim Demaille |
Subject: |
Bison 3.5.91 released [beta] |
Date: |
Wed, 29 Apr 2020 18:53:33 +0200 |
Hi all,
This is the second beta of Bison 3.6, which includes big changes
prompted by user feature requests. Dear users, we *need* feedback
about these new features, we *need* you to try them on your project to
make sure they address your need, to make sure your request was
properly understood.
I have personally experimented these changes on PHP (!), and have
started looking at other projects, but that will not suffice.
Compared to the previous beta, the error token is now named YYerror,
and when returned from the scanner, the parser enters error-recovery
without emitting a syntax error.
The previous annoucement message follows.
Cheers!
==================================================================
At the beginning of this year we started exploring how Bison could
address several shortcomings in the generation of syntax error
messages. I had sent an RFC
(https://lists.gnu.org/r/bison-patches/2020-01/msg00000.html), which
was already the second attempt to address this issue (the first one,
https://lists.gnu.org/r/bison-patches/2018-12/msg00088.html, had
several flaws).
Several answers, in particular from Christian Schoenebeck and Adrian
Vogelsgesang, helped to forge what is about to become Bison 3.6.
Here's the headlines for this release:
- the user can forge syntax error messages the way she wants.
- token string aliases can be internationalized, and UTF-8 sequences
are properly preserved.
- push parsers can ask at any moment for the list of acceptable token
kinds (PLUS, VARIABLE, etc.), which can be used to provide
syntax-driven autocompletion.
- the user is now given access to the symbol kind, which can be used
for instance in syntax error generation (e.g., changing the display
of a specific tokens, or finding groups of tokens to report
"expected operator" instead of "expected +, or -, or *, or /",
etc.), or during autocompletion (e.g., on existing variable names
when the kind is VARIABLE).
These changes are *big*, with a large impact on the API. It would be
extremely painful to discover in a few months something that was not
anticipated and which would require redesigning the API.
So **please** take the time to play with this beta, in particular to
see how these features allow you to get rid of dirty hacks you needed
to customize your error messages (I listed several such examples in
https://lists.gnu.org/r/bison-patches/2020-01/msg00000.html).
FWIW, there are a few issues I would like to sort out before releasing
Bison 3.6:
- I'm toying with the idea that 'return YYERRCODE' from the scanner
would have the parser enter error-recovery _without_ outputing an
error message. This would allow scanners to generate precise
messages, and yet let error-recovery do its magic. One can still
return YYUNDEF to have the parser emit an error message, and enter
error-recovery.
- there are details about the Java implementation I would like to
discuss about with someone knowledgeable in Java.
- I am still looking for someone to maintain the D skeleton
The documentation is up to date, you should find all the needed
details in it. The NEWS about 3.6 are quite exhaustive, have a look
at them below.
Thanks in advance!
Akim
==================================================================
Here are the compressed sources:
https://ftp.gnu.org/gnu/bison/bison-3.5.91.tar.gz (5.1MB)
https://ftp.gnu.org/gnu/bison/bison-3.5.91.tar.xz (3.1MB)
Here are the GPG detached signatures[*]:
https://ftp.gnu.org/gnu/bison/bison-3.5.91.tar.gz.sig
https://ftp.gnu.org/gnu/bison/bison-3.5.91.tar.xz.sig
Use a mirror for higher download bandwidth:
https://www.gnu.org/order/ftp.html
[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact. First, be sure to download both the .sig file
and the corresponding tarball. Then, run a command like this:
gpg --verify bison-3.5.91.tar.gz.sig
If that command fails because you don't have the required public key,
then run this command to import it:
gpg --keyserver keys.gnupg.net --recv-keys 0DDCAA3278D5264E
and rerun the 'gpg --verify' command.
This release was bootstrapped with the following tools:
Autoconf 2.69
Automake 1.16.2
Flex 2.6.4
Gettext 0.19.8.1
Gnulib v0.1-3370-g5ec4d920e
==================================================================
* Noteworthy changes in release 3.5.91 (2020-04-29) [stable]
** New features
*** Returning the error token
When the scanner returns an invalid token or the undefined token
(YYUNDEF), the parser generates an error message and enters error
recovery. Because of that error message, most scanners that find lexical
errors generate an error message, and then ignore the invalid input
without entering the error-recovery.
The scanners may now return YYerror, the error token, to enter the
error-recovery mode without triggering an additional error message. See
the bistromathic for an example.
*** The bistromathic features internationalization
Its way to build the error message is more general and is easy to use in
other projects.
* Noteworthy changes in release 3.5.90 (2020-04-18) [beta]
** Backward incompatible changes
TL;DR: replace "#define YYERROR_VERBOSE 1" by "%define parse.error verbose".
The YYERROR_VERBOSE macro is no longer supported; the parsers that still
depend on it will now produce Yacc-like error messages (just "syntax
error"). It was superseded by the "%error-verbose" directive in Bison
1.875 (2003-01-01). Bison 2.6 (2012-07-19) clearly announced that support
for YYERROR_VERBOSE would be removed. Note that since Bison 3.0
(2013-07-25), "%error-verbose" is deprecated in favor of "%define
parse.error verbose".
** New features
*** Improved syntax error messages
Two new values for the %define parse.error variable offer more control to
the user. Available in all the skeletons (C, C++, Java).
**** %define parse.error detailed
The behavior of "%define parse.error detailed" is closely resembling that
of "%define parse.error verbose" with a few exceptions. First, it is safe
to use non-ASCII characters in token aliases (with 'verbose', the result
depends on the locale with which bison was run). Second, a yysymbol_name
function is exposed to the user, instead of the yytnamerr function and the
yytname table. Third, token internationalization is supported (see
below).
**** %define parse.error custom
With this directive, the user forges and emits the syntax error message
herself by defining the yyreport_syntax_error function. A new type,
yypcontext_t, captures the circumstances of the error, and provides the
user with functions to get details, such as yypcontext_expected_tokens to
get the list of expected token kinds.
A possible implementation of yyreport_syntax_error is:
int
yyreport_syntax_error (const yypcontext_t *ctx)
{
int res = 0;
YY_LOCATION_PRINT (stderr, *yypcontext_location (ctx));
fprintf (stderr, ": syntax error");
// Report the tokens expected at this point.
{
enum { TOKENMAX = 10 };
yysymbol_kind_t expected[TOKENMAX];
int n = yypcontext_expected_tokens (ctx, expected, TOKENMAX);
if (n < 0)
// Forward errors to yyparse.
res = n;
else
for (int i = 0; i < n; ++i)
fprintf (stderr, "%s %s",
i == 0 ? ": expected" : " or", yysymbol_name
(expected[i]));
}
// Report the unexpected token.
{
yysymbol_kind_t lookahead = yypcontext_token (ctx);
if (lookahead != YYSYMBOL_YYEMPTY)
fprintf (stderr, " before %s", yysymbol_name (lookahead));
}
fprintf (stderr, "\n");
return res;
}
**** Token aliases internationalization
When the %define variable parse.error is set to `custom` or `detailed`,
one may specify which token aliases are to be translated using _(). For
instance
%token
PLUS "+"
MINUS "-"
<double>
NUM _("double precision number")
<symrec*>
FUN _("function")
VAR _("variable")
In that case the user must define _() and N_(), and yysymbol_name returns
the translated symbol (i.e., it returns '_("variable")' rather that
'"variable"'). In Java, the user must provide an i18n() function.
*** List of expected tokens (yacc.c)
Push parsers may invoke yypstate_expected_tokens at any point during
parsing (including even before submitting the first token) to get the list
of possible tokens. This feature can be used to propose autocompletion
(see below the "bistromathic" example).
It makes little sense to use this feature without enabling LAC (lookahead
correction).
*** Deep overhaul of the symbol and token kinds
To avoid the confusion with types in programming languages, we now refer
to token and symbol "kinds" instead of token and symbol "types". The
documentation and error messages have been revised.
All the skeletons have been updated to use dedicated enum types rather
than integral types. Special symbols are now regular citizens, instead of
being declared in ad hoc ways.
**** Token kinds
The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER,
LPAREN, etc. While backward compatibility is of course ensured, users are
nonetheless invited to replace their uses of "enum yytokentype" by
"yytoken_kind_t".
This type now also includes tokens that were previously hidden: YYEOF (end
of input), YYUNDEF (undefined token), and YYERRCODE (error token). They
now have string aliases, internationalized when internationalization is
enabled. Therefore, by default, error messages now refer to "end of file"
(internationalized) rather than the cryptic "$end", or to "invaid token"
rather than "$undefined".
Therefore in most cases it is now useless to define the end-of-line token
as follows:
%token T_EOF 0 "end of file"
Rather simply use "YYEOF" in your scanner.
**** Symbol kinds
The "symbol kinds" is what the parser actually uses. (Unless the
api.token.raw %define variable is used, the symbol kind of a terminal
differs from the corresponding token kind.)
They are now exposed as a enum, "yysymbol_kind_t".
This allows users to tailor the error messages the way they want, or to
process some symbols in a specific way in autocompletion (see the
bistromathic example below).
*** Modernize display of explanatory statements in diagnostics
Since Bison 2.7, output was indented four spaces for explanatory
statements. For example:
input.y:2.7-13: error: %type redeclaration for exp
input.y:1.7-11: previous declaration
Since the introduction of caret-diagnostics, it became less clear. This
indentation has been removed and submessages are displayed similarly as in
GCC:
input.y:2.7-13: error: %type redeclaration for exp
2 | %type <float> exp
| ^~~~~~~
input.y:1.7-11: note: previous declaration
1 | %type <int> exp
| ^~~~~
Contributed by Victor Morales Cayuela.
*** C++
The token and symbol kinds are yy::parser::token_kind_type and
yy::parser::symbol_kind_type.
The symbol_type::kind() member function allows to get the kind of a
symbol. This can be used to write unit tests for scanners, e.g.,
yy::parser::symbol_type t = make_NUMBER ("123");
assert (t.kind () == yy::parser::symbol_kind::S_NUMBER);
assert (t.value.as<int> () == 123);
** Documentation
*** User Manual
In order to avoid ambiguities with "type" as in "typing", we now refer to
the "token kind" (e.g., `PLUS`, `NUMBER`, etc.) rather than the "token
type". We now also refer to the "symbol type" (e.g., `PLUS`, `expr`,
etc.).
*** Examples
There are now two examples in examples/java: a very simple calculator, and
one that tracks locations to provide accurate error messages.
The lexcalc example (a simple example in C based on Flex and Bison) now
also demonstrates location tracking.
A new C example, bistromathic, is a fully featured interactive calculator
using many Bison features: pure interface, push parser, autocompletion
based on the current parser state (using yypstate_expected_tokens),
location tracking, internationalized custom error messages, lookahead
correction, rich debug traces, etc.
It shows how to depend on the symbol kinds to tailor autocompletion. For
instance it recognizes the symbol kind "VARIABLE" to propose
autocompletion on the existing variables, rather than of the word
"variable".