[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting involved in Bison

From: Adrian Vogelsgesang
Subject: Re: Getting involved in Bison
Date: Wed, 16 Oct 2019 07:00:54 +0000
User-agent: Microsoft-MacOutlook/10.10.b.190609

Hi Victor,

glad to hear you want to contribute to bison and looking forward to It! :)

You can download the repository via CVS 
(https://savannah.gnu.org/cvs/?group=bison) or via git 
The anonymous option is sufficient, as you will be sending your patches via the 
mailing list anyway.
One of the maintainers will then review the patch and merge it into the 
official repository.

A few tips regarding git: (I personally only have experience with Git, no idea 
about CVS)
I recommend keeping your own fork of the repository somewhere, e.g. on Github. 
That way you have somewhere where you can push your changes for backup, in case 
you lose your local machine due to whatever reason. You would have to manage 
two remote git repositories then, your fork and the official upstream repo. But 
there’s plenty of documentation out there on how to do that.
For formatting changes for the mailing list, I used “git format-patch” 

My first feature (Lookahead Correction in C++) took a few weeks to make it into 
the mainline, but that’s mostly my fault. There were a few issues caught in the 
code review and I didn’t have the time to address them right away. Akim was 
really helpful to make that patch land on master, so there is no reason to be 
afraid of the reviews


From: bug-bison <bug-bison-bounces+avogelsgesang=address@hidden> on behalf of 
"Morales Cayuela, Victor (NSB - CN/Hangzhou)" <address@hidden>
Date: Wednesday, 16 October 2019 at 07:46
To: Akim Demaille <address@hidden>, Paul Eggert <address@hidden>
Cc: Bison Bugs <address@hidden>
Subject: RE: Getting involved in Bison


Considering that this is the first time I collaborate in this project I would 
like to start with something easy. First I'd like to get used to the way of 
working, code style and review, testing... etc. I could do as a first contact 
the graph generator clean up that you mentioned. I have already had a look at 
the TODO list and other issues but I can't figure out which one might also be 
easy to start with.

About my skills, I can write quite good C/C++ code (C++14), I have a long 
experience with these two languages in projects of millions of lines. I am code 
reviewer in my company and I have been awarded a few times, so this part 
shouldn't be a problem. I haven't used m4 before, although I will start 
learning asap.

Btw, how long time do you usually estimate for a feature/ to be delivered? 
Could you also let me know how to check out the source project? Should I need 
to register first in some git repository? I've never worked in open source 
projects, not really sure how they are managed.


-----Original Message-----
From: Akim Demaille <address@hidden>
Sent: Tuesday, October 15, 2019 2:42 PM
To: Paul Eggert <address@hidden>
Cc: Morales Cayuela, Victor (NSB - CN/Hangzhou) <address@hidden>; Bison Bugs 
Subject: Re: Getting involved in Bison

Hi Victor1

> Le 15 oct. 2019 à 06:19, Paul Eggert <address@hidden> a écrit :
> On 10/14/19 7:12 PM, Morales Cayuela, Victor (NSB - CN/Hangzhou) wrote:
>> Could you let me know in which areas you would need help?
> Thanks for volunteering. Akim is the best person to ask.

Thanks :)

> Also, I suggest looking at Bison's TODO file for some ideas.
> https://git.savannah.gnu.org/cgit/bison.git/tree/TODO<https://git.savannah.gnu.org/cgit/bison.git/tree/TODO>

Which was the impetus I needed to update it, see below.

For a small project, Bison is quite big, and requires really different skills 
depending on where you, Victor, would like to work on. I strongly recommend 
starting with simple things (which is != from dummy).

On the backend side (aka skeleton), in C++, how about implementing push 
parsers? That would be very useful in several projects I know. It moderately 
difficult to implement "by hand", but you'll certainly find that m4 is a weird 
beast. One path would be to generate a usual pull parser for say arithmetics, 
and work it by hand to become a push parser, and later see how to move these 
changes into lalr1.cc<http://lalr1.cc>.

In bison itself (the generator), for a simple start, I would recommend cleaning 
up the graph generation. Today it's sort of OOP with an abstract interface for 
graph, and a concrete implementation for Dot. This is because decades ago we 
supported a format called VCG, which has disappeared since then. I think we 
should flatten this to a direct interface for Dot, removing all the useless 

There are many more possible things, but it really depends what you'd like to 
work on, and how fluent you are in C (for bison the generator) and m4 (the 

diff --git a/TODO b/TODO
index f3f08ce1..d2c56b73 100644
--- a/TODO
+++ b/TODO
@@ -7,9 +7,6 @@ breaks.
Also, we seem to teach YYPRINT very early on, although it should be considered 
deprecated: %printer is superior.

-** glr.cc<http://glr.cc>
-move glr.c into the yy namespace
** improve syntax errors (UTF-8, internationalization) Bison depends on the 
current locale. For instance:

@@ -58,7 +55,7 @@ Maybe we should exhibit the YYUNDEFTOK token. It could also 
be assigned a semantic value so that yyerror could be used to report invalid 

* Bison 3.6
-** Unit rules
+** Unit rules / Injection rules (Akim Demaille)
Maybe we could expand unit rules (or "injections", see 
 i.e., transform @@ -77,10 +74,12 @@ Practice' is impossible to find, but 
according to 'Parsing Techniques: a Practical Guide', it includes information 
about this issue. Does anybody have it?

-** Injection rules
-See above.
+** clean up (Akim Demaille)
+Do not work on these items now, as I (Akim) have branches with a lot of
+changes in this area (hitting several files), and no desire to have to
+fix conflicts. Addressing these items will happen after my branches
+have been merged.

-** clean up
*** lalr.c
Introduce a goto struct, and use it in place of from_state/to_state.
Rename states1 as path, length as pathlen.
@@ -130,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
38: http://input.at:1730<http://input.at:1730> errors

* Short term
+** Stop indentation in diagnostics
+Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
+ input.y:2.7-12: %type redeclaration for exp
+ input.y:1.7-12: previous declaration
+In Bison 2.7, we indented them
+ input.y:2.7-12: error: %type redeclaration for exp
+ input.y:1.7-12: previous declaration
+Later we quoted the source in the diagnostics, and today we have:
+ /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
+ 1 | %token FOO FOO
+ | ^~~
+ /tmp/foo.y:1.8-10: previous declaration
+ 1 | %token FOO FOO
+ | ^~~
+The indentation is no longer helping. We should probably get rid of
+it, or maybe keep it only when -fno-caret. GCC displays this as a "note":
+ $ g++-mp-9 -Wall /tmp/foo.c -c
+ /tmp/foo.c:1:10: error: redefinition of 'int foo'
+ 1 | int foo, foo;
+ | ^~~
+ /tmp/foo.c:1:5: note: 'int foo' previously declared here
+ 1 | int foo, foo;
+ | ^~~
+Likewise for Clang, contrary to what I believed (because "note:" is
+written in black, so it doesn't show in my terminal :-)
+ $ clang++-mp-8.0 -Wall /tmp/foo.c -c
+ clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior 
is deprecated [-Wdeprecated]
+ /tmp/foo.c:1:10: error: redefinition of 'foo'
+ int foo, foo;
+ ^
+ /tmp/foo.c:1:5: note: previous definition is here
+ int foo, foo;
+ ^
+ 1 error generated.
+** Better design for diagnostics
+The current implementation of diagnostics is adhoc, it grew
+organically. It works as a series of calls to several functions, with
+dependency of the latter calls on the former. For instance:
+ complain (&sym->location,
+ sym->content->status == needed ? complaint : Wother,
+ _("symbol %s is used, but is not defined as a token"
+ " and has no rules; did you mean %s?"),
+ quote_n (0, sym->tag),
+ quote_n (1, best->tag));
+ if (feature_flag & feature_caret)
+ location_caret_suggestion (sym->location, best->tag, stderr);
+We should rewrite this in a more FP way:
+1. build a rich structure that denotes the (complete) diagnostic.
+ "Complete" in the sense that it also contains the suggestions, the list
+ of possible matches, etc.
+2. send this to the pretty-printing routine. The diagnostic structure
+ should be sufficient so that we can generate all the 'format' of
+ diagnostics, including the fixits.
+If properly done, this diagnostic module can be detached from Bison and
+be put in gnulib. It could be used, for instance, for errors caught by
+There's certainly already something alike in GCC. At least that's the
+impression I get from reading the "-fdiagnostics-format=FORMAT" part of
** consistency
token vs terminal

@@ -139,11 +216,10 @@ itself uses int (for yylen for instance), yet stack is 
based on size_t.

Maybe locations should also move to ints.

-** C
-Introduce state_type rather than spreading yytype_int16 everywhere?
-** glr.c
-yyspaceLeft should probably be a pointer diff.
+Paul Eggert already covered most of this. But before publishing these
+changes, we need to ask our C++ users if they agree with that change,
+or if we need some migration path. Could be a %define variable, or
+simply %require "3.5".

** Graphviz display code thoughts
The code for the --graph option is over two files: print_graph, and @@ -164,9 
+240,6 @@ Little effort seems to have been given to factoring these files and 
their rint{,-xml} counterpart. We would very much like to re-use the pretty 
format of states from .output for the graphs, etc.

-Also, the underscore in print_graph.[ch] isn't very fitting considering the 
-dashes in the other filenames.
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?

** push-parser
@@ -224,11 +297,13 @@ since it is no longer bound to a particular parser, it's 
just a (standalone symbol).

* Various
-** Rewrite glr.cc<http://glr.cc> in C++
+** Rewrite glr.cc<http://glr.cc> in C++ (Valentin Tolmer)
As a matter of fact, it would be very interesting to see how much we can share 
between lalr1.cc<http://lalr1.cc> and glr.cc<http://glr.cc>. Most of the 
skeletons should be common.
It would be a very nice source of inspiration for the other languages.

+Valentin Tolmer is working on this.
Defined to 256, but not used, not documented. Probably the token number for the 
error token, which POSIX wants to be 256, but which @@ -298,10 +373,21 @@ other 
improvements and also made it faster (probably because memory management is 
performed once instead of three times). I suggest that we do the same in yacc.c.

+(Some time later): it's also very nice to have three stacks: it's more
+dense as we don't lose bits to padding. For instance the typical stack
+for states will use 8 bits, while it is likely to consume 32 bits in a struct.
+We need trustworthy benchmarks for Bison, for all our backends. Akim
+has a few things scattered around; we need to put them in the repo, and
+make them more useful.
** yysyntax_error
The code bw glr.c and yacc.c is really alike, we can certainly factor some 

+This should be worked on when we also address the expected improvements
+for error generation (e.g., i18n).

* Report

@@ -341,7 +427,26 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France

* Extensions
** Multiple start symbols
-Would be very useful when parsing closely related languages.
+Would be very useful when parsing closely related languages. The idea
+is to declare several start symbols, for instance
+ %start stmt expr
+ %%
+ stmt: ...
+ expr: ...
+and to generate parse(), parse_stmt() and parse_expr(). Technically,
+the above grammar would be transformed into
+ %start yy_start
+ %%
+ yy_start: YY_START_STMT stmt | YY_START_EXPR expr
+so that there are no new conflicts in the grammar (as would undoubtedly
+happen with yy_start: stmt | expr). Then adjust the skeletons so that
+this initial token (YY_START_STMT, YY_START_EXPR) be shifted first in
+the corresponding parse function.

** Better error messages
The users are not provided with enough tools to forge their error messages.
@@ -359,6 +464,12 @@ should make this reasonably easy to implement.
Bruce Mardle <address@hidden>

+However, there are many other things to do before having such a
+feature, because I don't want a % equivalent to #include (which we all
+learned to hate). I want something that builds "modules" of grammars,
+and assembles them together, paying attention to keep separate bits
+separated, in pseudo name spaces.
** Push parsers
There is demand for push parsers in Java and C++. And GLR I guess.

@@ -385,6 +496,10 @@ must be in the scanner: we must not parse what is in a 
switched off part of %if. Akim Demaille thinks it should be in the parser, so 
as to avoid falling into another CPP mistake.

+(Later): I'm sure there's actually good case for this. People who need
+that feature can use m4/cpp on top of Bison. I don't think it is worth
+the trouble in Bison itself.
** XML Output
There are couple of available extensions of Bison targeting some XML output. 
Some day we should consider including them. One issue is @@ -404,6 +519,9 @@ 
XML output for GNU Bison 

+Andrew Myers and Vincent Imbimbo are working on this item, see
* Coding system independence
Paul notes:

@@ -433,6 +551,7 @@ It is unfortunate that there is a total order for 
precedence. It makes it impossible to have modular precedence information. We 
should move to partial orders (sounds like series/parallel orders to me).

+This is a prerequisite for modules.

* $undefined
From Hans:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]