[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Getting involved in Bison

From: Morales Cayuela, Victor (NSB - CN/Hangzhou)
Subject: RE: Getting involved in Bison
Date: Wed, 16 Oct 2019 05:29:54 +0000


Considering that this is the first time I collaborate in this project I would 
like to start with something easy. First I'd like to get used to the way of 
working, code style and review, testing... etc. I could do as a first contact 
the graph generator clean up that you mentioned. I have already had a look at 
the TODO list and other issues but I can't figure out which one might also be 
easy to start with.

About my skills, I can write quite good C/C++ code (C++14), I have a long 
experience with these two languages in projects of millions of lines. I am code 
reviewer in my company and I have been awarded a few times, so this part 
shouldn't be a problem. I haven't used m4 before, although I will start 
learning asap.

Btw, how long time do you usually estimate for a feature/ to be delivered? 
Could you also let me know how to check out the source project? Should I need 
to register first in some git repository? I've never worked in open source 
projects, not really sure how they are managed.


-----Original Message-----
From: Akim Demaille <address@hidden> 
Sent: Tuesday, October 15, 2019 2:42 PM
To: Paul Eggert <address@hidden>
Cc: Morales Cayuela, Victor (NSB - CN/Hangzhou) <address@hidden>; Bison Bugs 
Subject: Re: Getting involved in Bison

Hi Victor1

> Le 15 oct. 2019 à 06:19, Paul Eggert <address@hidden> a écrit :
> On 10/14/19 7:12 PM, Morales Cayuela, Victor (NSB - CN/Hangzhou) wrote:
>> Could you let me know in which areas you would need help?
> Thanks for volunteering. Akim is the best person to ask.

Thanks :)

> Also, I suggest looking at Bison's TODO file for some ideas.
> https://git.savannah.gnu.org/cgit/bison.git/tree/TODO

Which was the impetus I needed to update it, see below.

For a small project, Bison is quite big, and requires really different skills 
depending on where you, Victor, would like to work on.  I strongly recommend 
starting with simple things (which is != from dummy).

On the backend side (aka skeleton), in C++, how about implementing push 
parsers?  That would be very useful in several projects I know.  It moderately 
difficult to implement "by hand", but you'll certainly find that m4 is a weird 
beast.  One path would be to generate a usual pull parser for say arithmetics, 
and work it by hand to become a push parser, and later see how to move these 
changes into lalr1.cc.

In bison itself (the generator), for a simple start, I would recommend cleaning 
up the graph generation.  Today it's sort of OOP with an abstract interface for 
graph, and a concrete implementation for Dot.  This is because decades ago we 
supported a format called VCG, which has disappeared since then.  I think we 
should flatten this to a direct interface for Dot, removing all the useless 

There are many more possible things, but it really depends what you'd like to 
work on, and how fluent you are in C (for bison the generator) and m4 (the 

diff --git a/TODO b/TODO
index f3f08ce1..d2c56b73 100644
--- a/TODO
+++ b/TODO
@@ -7,9 +7,6 @@ breaks.
 Also, we seem to teach YYPRINT very early on, although it should be  
considered deprecated: %printer is superior.
-** glr.cc
-move glr.c into the yy namespace
 ** improve syntax errors (UTF-8, internationalization)  Bison depends on the 
current locale.  For instance:
@@ -58,7 +55,7 @@ Maybe we should exhibit the YYUNDEFTOK token.  It could also 
be assigned a  semantic value so that yyerror could be used to report invalid 
 * Bison 3.6
-** Unit rules
+** Unit rules / Injection rules (Akim Demaille)
 Maybe we could expand unit rules (or "injections", see  
https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,  
transform @@ -77,10 +74,12 @@ Practice' is impossible to find, but according to 
'Parsing Techniques: a  Practical Guide', it includes information about this 
issue.  Does anybody  have it?
-** Injection rules
-See above.
+** clean up (Akim Demaille)
+Do not work on these items now, as I (Akim) have branches with a lot of 
+changes in this area (hitting several files), and no desire to have to 
+fix conflicts.  Addressing these items will happen after my branches 
+have been merged.
-** clean up
 *** lalr.c
 Introduce a goto struct, and use it in place of from_state/to_state.
 Rename states1 as path, length as pathlen.
@@ -130,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
   38: input.at:1730      errors
 * Short term
+** Stop indentation in diagnostics
+Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
+    input.y:2.7-12: %type redeclaration for exp
+    input.y:1.7-12: previous declaration
+In Bison 2.7, we indented them
+    input.y:2.7-12: error: %type redeclaration for exp
+    input.y:1.7-12:     previous declaration
+Later we quoted the source in the diagnostics, and today we have:
+    /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
+        1 | %token FOO FOO
+          |            ^~~
+    /tmp/foo.y:1.8-10:      previous declaration
+        1 | %token FOO FOO
+          |        ^~~
+The indentation is no longer helping.  We should probably get rid of 
+it, or maybe keep it only when -fno-caret. GCC displays this as a "note":
+    $ g++-mp-9 -Wall /tmp/foo.c -c
+    /tmp/foo.c:1:10: error: redefinition of 'int foo'
+        1 | int foo, foo;
+          |          ^~~
+    /tmp/foo.c:1:5: note: 'int foo' previously declared here
+        1 | int foo, foo;
+          |     ^~~
+Likewise for Clang, contrary to what I believed (because "note:" is 
+written in black, so it doesn't show in my terminal :-)
+    $ clang++-mp-8.0 -Wall /tmp/foo.c -c
+    clang: warning: treating 'c' input as 'c++' when in C++ mode, this 
behavior is deprecated [-Wdeprecated]
+    /tmp/foo.c:1:10: error: redefinition of 'foo'
+    int foo, foo;
+             ^
+    /tmp/foo.c:1:5: note: previous definition is here
+    int foo, foo;
+        ^
+    1 error generated.
+** Better design for diagnostics
+The current implementation of diagnostics is adhoc, it grew 
+organically.  It works as a series of calls to several functions, with 
+dependency of the latter calls on the former.  For instance:
+      complain (&sym->location,
+                sym->content->status == needed ? complaint : Wother,
+                _("symbol %s is used, but is not defined as a token"
+                  " and has no rules; did you mean %s?"),
+                quote_n (0, sym->tag),
+                quote_n (1, best->tag));
+      if (feature_flag & feature_caret)
+        location_caret_suggestion (sym->location, best->tag, stderr);
+We should rewrite this in a more FP way:
+1. build a rich structure that denotes the (complete) diagnostic.
+   "Complete" in the sense that it also contains the suggestions, the list
+   of possible matches, etc.
+2. send this to the pretty-printing routine.  The diagnostic structure
+   should be sufficient so that we can generate all the 'format' of
+   diagnostics, including the fixits.
+If properly done, this diagnostic module can be detached from Bison and 
+be put in gnulib.  It could be used, for instance, for errors caught by 
+There's certainly already something alike in GCC.  At least that's the 
+impression I get from reading the "-fdiagnostics-format=FORMAT" part of 
 ** consistency
 token vs terminal
@@ -139,11 +216,10 @@ itself uses int (for yylen for instance), yet stack is 
based on size_t.
 Maybe locations should also move to ints.
-** C
-Introduce state_type rather than spreading yytype_int16 everywhere?
-** glr.c
-yyspaceLeft should probably be a pointer diff.
+Paul Eggert already covered most of this.  But before publishing these 
+changes, we need to ask our C++ users if they agree with that change, 
+or if we need some migration path.  Could be a %define variable, or 
+simply %require "3.5".
 ** Graphviz display code thoughts
 The code for the --graph option is over two files: print_graph, and @@ -164,9 
+240,6 @@ Little effort seems to have been given to factoring these files and 
their  rint{,-xml} counterpart. We would very much like to re-use the pretty 
format  of states from .output for the graphs, etc.
-Also, the underscore in print_graph.[ch] isn't very fitting considering the 
-dashes in the other filenames.
 Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
 ** push-parser
@@ -224,11 +297,13 @@ since it is no longer bound to a particular parser, it's 
just a  (standalone symbol).
 * Various
-** Rewrite glr.cc in C++
+** Rewrite glr.cc in C++ (Valentin Tolmer)
 As a matter of fact, it would be very interesting to see how much we can  
share between lalr1.cc and glr.cc.  Most of the skeletons should be common.
 It would be a very nice source of inspiration for the other languages.
+Valentin Tolmer is working on this.
 Defined to 256, but not used, not documented.  Probably the token  number for 
the error token, which POSIX wants to be 256, but which @@ -298,10 +373,21 @@ 
other improvements and also made it faster (probably because memory  management 
is performed once instead of three times).  I suggest that  we do the same in 
+(Some time later): it's also very nice to have three stacks: it's more 
+dense as we don't lose bits to padding.  For instance the typical stack 
+for states will use 8 bits, while it is likely to consume 32 bits in a struct.
+We need trustworthy benchmarks for Bison, for all our backends.  Akim 
+has a few things scattered around; we need to put them in the repo, and 
+make them more useful.
 ** yysyntax_error
 The code bw glr.c and yacc.c is really alike, we can certainly factor  some 
+This should be worked on when we also address the expected improvements 
+for error generation (e.g., i18n).
 * Report
@@ -341,7 +427,26 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
 * Extensions
 ** Multiple start symbols
-Would be very useful when parsing closely related languages.
+Would be very useful when parsing closely related languages.  The idea 
+is to declare several start symbols, for instance
+    %start stmt expr
+    %%
+    stmt: ...
+    expr: ...
+and to generate parse(), parse_stmt() and parse_expr().  Technically, 
+the above grammar would be transformed into
+   %start yy_start
+   %%
+   yy_start: YY_START_STMT stmt | YY_START_EXPR expr
+so that there are no new conflicts in the grammar (as would undoubtedly 
+happen with yy_start: stmt | expr).  Then adjust the skeletons so that 
+this initial token (YY_START_STMT, YY_START_EXPR) be shifted first in 
+the corresponding parse function.
 ** Better error messages
 The users are not provided with enough tools to forge their error messages.
@@ -359,6 +464,12 @@ should make this reasonably easy to implement.
 Bruce Mardle <address@hidden>
+However, there are many other things to do before having such a 
+feature, because I don't want a % equivalent to #include (which we all 
+learned to hate).  I want something that builds "modules" of grammars, 
+and assembles them together, paying attention to keep separate bits 
+separated, in pseudo name spaces.
 ** Push parsers
 There is demand for push parsers in Java and C++.  And GLR I guess.
@@ -385,6 +496,10 @@ must be in the scanner: we must not parse what is in a 
switched off  part of %if.  Akim Demaille thinks it should be in the parser, so 
as  to avoid falling into another CPP mistake.
+(Later): I'm sure there's actually good case for this.  People who need 
+that feature can use m4/cpp on top of Bison.  I don't think it is worth 
+the trouble in Bison itself.
 ** XML Output
 There are couple of available extensions of Bison targeting some XML  output.  
Some day we should consider including them.  One issue is @@ -404,6 +519,9 @@ 
XML output for GNU Bison  
+Andrew Myers and Vincent Imbimbo are working on this item, see
 * Coding system independence
 Paul notes:
@@ -433,6 +551,7 @@ It is unfortunate that there is a total order for 
precedence.  It  makes it impossible to have modular precedence information.  
We should  move to partial orders (sounds like series/parallel orders to me).
+This is a prerequisite for modules.
 * $undefined
 From Hans:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]