[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/8] Revamp the handling token string aliases in error messages
From: |
Akim Demaille |
Subject: |
[PATCH 0/8] Revamp the handling token string aliases in error messages |
Date: |
Sat, 29 Dec 2018 17:30:19 +0100 |
Hi all,
This series of patches addresses two related shortcomings: currently
we destroy non-ASCII token strings (which ruins Hans' use of
mathematical symbols for instance), and we don't provide a means to
translate the token names in error messages.
See https://lists.gnu.org/archive/html/bison-patches/2018-11/msg00030.html.
Paul, I have completely removed your work that quoted the token names
in tname. In retrospect, I don't think we should have done that.
This way it becomes straightforward to translate these strings, as
shown by the "translate bison's own tokens" change. Bison's grammar
becomes:
%token
GRAM_EOF 0 _("end of file")
STRING _("string")
TSTRING _("translatable string")
and then I have:
$ cat /tmp/wrong.y
%token 12
%%
exp:
$ LC_ALL=C ./_build/8d/tests/bison /tmp/wrong.y
/tmp/wrong.y:1.8-9: error: syntax error, unexpected integer literal,
expecting character literal or identifier or <tag>
%token 12
^^
$ ./_build/8d/tests/bison /tmp/wrong.y
/tmp/wrong.y:1.8-9: erreur: erreur de syntaxe, littéral entier inattendu,
attendait caractère littéral ou identifiant ou <tag>
%token 12
^^
What I did also changes the signature of yytnamerr, which you made
overridable by the user using an ifndef yytnamerr. This customization
point, yytnamerr, was not documented. I think that if we simply
change the name, things will continue to compile, but some error
messages, if people tweaked yytnamerr, will change maybe unexpectedly.
I also removed the support for trigraphs. Again, I would claim that
that's the user's problem.
However, I do have broken a documented contract: the documentation
clearly specifies how tokens are stored in yytname:
-- Directive: %token-table
Generate an array of token names in the parser implementation file.
The name of the array is ‘yytname’; ‘yytname[I]’ is the name of the
token whose internal Bison token code number is I. The first three
elements of ‘yytname’ correspond to the predefined tokens ‘"$end"’,
‘"error"’, and ‘"$undefined"’; after these come the symbols defined
in the grammar file.
The name in the table includes all the characters needed to
represent the token in Bison. For single-character literals and
literal strings, this includes the surrounding quoting characters
and any escape sequences. For example, the Bison single-character
literal ‘'+'’ corresponds to a three-character name, represented in
C as ‘"'+'"’; and the Bison two-character literal string ‘"\\/"’
corresponds to a five-character name, represented in C as
‘"\"\\\\/\""’.
I don't understand well what people can do from this table. In
particular, it is not easily helpful to directly generate scanner
rules, since the connection with the external token number (the one
returned by yylex) is not trivial and is not documented.
Rici, you might have some relevant input on this issue. If that's
really a problem, we can generate two tables: one for backward
compatibility (deprecated?), and the new one for error messages.
This series of patch is a starting point to discuss alternatives.
Nothing is cast in stone here.
I would really like to address this in 3.3, which I expect to release
within a couple of months at most. This feature was the last one
expected in 3.3.
This is currently on gnu.org in the token-i18n branch, and available
in these tarballs.
https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.xz
Cheers!
Akim Demaille (8):
yacc.c: avoid negated if
parsers: revamp the interface of yytnamerr
tests: no longer play with trigraphs
parsers: don't double escape tnames
parsers: support translatable token aliases
tests: check that internationalization of token works
translate bison's own tokens
regen
data/skeletons/glr.c | 90 +--
data/skeletons/lalr1.cc | 56 +-
data/skeletons/lalr1.d | 38 +-
data/skeletons/lalr1.java | 41 +-
data/skeletons/yacc.c | 75 +-
src/output.c | 33 +-
src/parse-gram.c | 1358 +++++++++++++++++++------------------
src/parse-gram.h | 141 ++--
src/parse-gram.y | 96 +--
src/scan-gram.l | 25 +-
src/symtab.c | 3 +-
src/symtab.h | 7 +-
tests/calc.at | 21 +-
tests/input.at | 10 +-
tests/javapush.at | 64 +-
tests/local.at | 5 +-
tests/regression.at | 38 +-
17 files changed, 1019 insertions(+), 1082 deletions(-)
--
2.20.0
- [PATCH 0/8] Revamp the handling token string aliases in error messages,
Akim Demaille <=
- [PATCH 6/8] tests: check that internationalization of token works, Akim Demaille, 2018/12/29
- [PATCH 7/8] translate bison's own tokens, Akim Demaille, 2018/12/29
- [PATCH 4/8] parsers: don't double escape tnames, Akim Demaille, 2018/12/29
- [PATCH 2/8] parsers: revamp the interface of yytnamerr, Akim Demaille, 2018/12/29
- [PATCH 1/8] yacc.c: avoid negated if, Akim Demaille, 2018/12/29
- [PATCH 3/8] tests: no longer play with trigraphs, Akim Demaille, 2018/12/29
- [PATCH 5/8] parsers: support translatable token aliases, Akim Demaille, 2018/12/29
- [PATCH 8/8] regen, Akim Demaille, 2018/12/29