Re: y.output format

help-bison
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: y.output format

From:	Akim Demaille
Subject:	Re: y.output format
Date:	26 Jun 2002 18:54:32 +0200
User-agent:	Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter)
>>>>> "Matthew" == Matthew P Carter <address@hidden> writes:

Matthew> On 10 Jan 2002, Hans Aberg wrote:
>> 
>> At 18:54 +0100 2002/01/09, Bernard Granier wrote:
>> > Where can I find a simple tutorial which explains the y.output
>> > file format ?

In the CVS documentation there is:

Understanding Your Parser
=========================

   As documented elsewhere (*note The Bison Parser Algorithm:
Algorithm.)  Bison parsers are "shift/reduce automata".  In some cases
(much more frequent than one would hope), looking at this automaton is
required to tune or simply fix a parser.  Bison provides two different
representation of it, either textually or graphically (as a VCG file).

   The textual file is generated when the options `--report' or
`--verbose' are specified, see *Note Invoking Bison: Invocation.  Its
name is made by removing `.tab.c' or `.c' from the parser output file
name, and adding `.output' instead.  Therefore, if the input file is
`foo.y', then the parser file is called `foo.tab.c' by default.  As a
consequence, the verbose output file is called `foo.output'.

   The following grammar file, `calc.y', will be used in the sequel:

     %token NUM STR
     %left '+' '-'
     %left '*'
     %%
     exp: exp '+' exp
        | exp '-' exp
        | exp '*' exp
        | exp '/' exp
        | NUM
        ;
     useless: STR;
     %%

   `bison' reports that `calc.y contains 1 useless nonterminal and 1
useless rule' and that `calc.y contains 7 shift/reduce conflicts'.
When given `--report=state', in addition to `calc.tab.c', it creates a
file `calc.output' with contents detailed below.  The order of the
output and the exact presentation might vary, but the interpretation is
the same.

   The first section includes details on conflicts that were solved
thanks to precedence and/or associativity:

     Conflict in state 8 between rule 2 and token '+' resolved as reduce.
     Conflict in state 8 between rule 2 and token '-' resolved as reduce.
     Conflict in state 8 between rule 2 and token '*' resolved as shift.
...

The next section lists states that still have conflicts.

     State 8 contains 1 shift/reduce conflict.
     State 9 contains 1 shift/reduce conflict.
     State 10 contains 1 shift/reduce conflict.
     State 11 contains 4 shift/reduce conflicts.

The next section reports useless tokens, nonterminal and rules.  Useless
nonterminals and rules are removed in order to produce a smaller parser,
but useless tokens are preserved, since they might be used by the
scanner (note the difference between "useless" and "not used" below):

     Useless nonterminals:
        useless
     
     Terminals which are not used:
        STR
     
     Useless rules:
     #6     useless: STR;

The next section reproduces the exact grammar that Bison used:

     Grammar
     
       Number, Line, Rule
         0   5 $axiom -> exp $
         1   5 exp -> exp '+' exp
         2   6 exp -> exp '-' exp
         3   7 exp -> exp '*' exp
         4   8 exp -> exp '/' exp
         5   9 exp -> NUM

and reports the uses of the symbols:

     Terminals, with rules where they appear
     
     $ (0) 0
     '*' (42) 3
     '+' (43) 1
     '-' (45) 2
     '/' (47) 4
     error (256)
     NUM (258) 5
     
     Nonterminals, with rules where they appear
     
     $axiom (8)
         on left: 0
     exp (9)
         on left: 1 2 3 4 5, on right: 0 1 2 3 4

Bison then proceeds onto the automaton itself, describing each state
with it set of "items", also known as "pointed rules".  Each item is a
production rule together with a point (marked by `.') that the input
cursor.

     state 0
     
         $axiom  ->  . exp $   (rule 0)
     
         NUM    shift, and go to state 1
     
         exp    go to state 2

   This reads as follows: "state 0 corresponds to being at the very
beginning of the parsing, in the initial rule, right before the start
symbol (here, `exp').  When the parser returns to this state right
after having reduced a rule that produced an `exp', the control flow
jumps to state 2.  If there is no such transition on a nonterminal
symbol, and the lookahead is a `NUM', then this token is shifted on the
parse stack, and the control flow jumps to state 1.  Any other
lookahead triggers a parse error."

   Even though the only active rule in state 0 seems to be rule 0, the
report lists `NUM' as a lookahead symbol because `NUM' can be at the
beginning of any rule deriving an `exp'.  By default Bison reports the
so-called "core" or "kernel" of the item set, but if you want to see
more detail you can invoke `bison' with `--report=itemset' to list all
the items, include those that can be derived:

     state 0
     
         $axiom  ->  . exp $   (rule 0)
         exp  ->  . exp '+' exp   (rule 1)
         exp  ->  . exp '-' exp   (rule 2)
         exp  ->  . exp '*' exp   (rule 3)
         exp  ->  . exp '/' exp   (rule 4)
         exp  ->  . NUM   (rule 5)
     
         NUM         shift, and go to state 1
     
         exp         go to state 2

In the state 1...

     state 1
     
         exp  ->  NUM .   (rule 5)
     
         $default       reduce using rule 5 (exp)

the rule 5, `exp: NUM;', is completed.  Whatever the lookahead
(`$default'), the parser will reduce it.  If it was coming from state
0, then, after this reduction it will return to state 0, and will jump
to state 2 (`exp: go to state 2').

     state 2
     
         $axiom  ->  exp . $   (rule 0)
         exp  ->  exp . '+' exp   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
     
         $      shift, and go to state 3
         '+'    shift, and go to state 4
         '-'    shift, and go to state 5
         '*'    shift, and go to state 6
         '/'    shift, and go to state 7

In state 2, the automaton can only shift a symbol.  For instance,
because of the item `exp -> exp . '+' exp', if the lookahead if `+', it
will be shifted on the parse stack, and the automaton control will jump
to state 4, corresponding to the item `exp -> exp '+' . exp'.  Since
there is no default action, any other token than those listed above
will trigger a parse error.

   The state 3 is named the "final state", or the "accepting state":

     state 3
     
         $axiom  ->  exp $ .   (rule 0)
     
         $default       accept

the initial rule is completed (the start symbol and the end of input
were read), the parsing exits successfully.

   The interpretation of states 4 to 7 is straightforward, and is left
to the reader.

     state 4
     
         exp  ->  exp '+' . exp   (rule 1)
     
         NUM    shift, and go to state 1
     
         exp    go to state 8
     
     state 5
     
         exp  ->  exp '-' . exp   (rule 2)
     
         NUM    shift, and go to state 1
     
         exp    go to state 9
     
     state 6
     
         exp  ->  exp '*' . exp   (rule 3)
     
         NUM    shift, and go to state 1
     
         exp    go to state 10
     
     state 7
     
         exp  ->  exp '/' . exp   (rule 4)
     
         NUM    shift, and go to state 1
     
         exp    go to state 11

   As was announced in beginning of the report, `State 8 contains 1
shift/reduce conflict':

     state 8
     
         exp  ->  exp . '+' exp   (rule 1)
         exp  ->  exp '+' exp .   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
     
         '*'    shift, and go to state 6
         '/'    shift, and go to state 7
     
         '/'    [reduce using rule 1 (exp)]
         $default       reduce using rule 1 (exp)

   Indeed, there are two actions associated to the lookahead `/':
either shifting (and going to state 7), or reducing rule 1.  The
conflict means that either the grammar is ambiguous, or the parser lacks
information to make the right decision.  Indeed the grammar is
ambiguous, as, since we did not specify the precedence of `/', the
sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which
corresponds to shifting `/', or as `(NUM + NUM) / NUM', which
corresponds to reducing rule 1.

   Because in LALR(1) parsing a single decision can be made, Bison
arbitrarily chose to disable the reduction, see *Note Shift/Reduce
Conflicts: Shift/Reduce.  Discarded actions are reported in between
square brackets.

   Note that all the previous states had a single possible action:
either shifting the next token and going to the corresponding state, or
reducing a single rule.  In the other cases, i.e., when shifting _and_
reducing is possible or when _several_ reductions are possible, the
lookahead is required to select the action.  State 8 is one such state:
if the lookahead is `*' or `/' then the action is shifting, otherwise
the action is reducing rule 1.  In other words, the first two items,
corresponding to rule 1, are not eligible when the lookahead is `*',
since we specified that `*' has higher precedence that `+'.  More
generally, some items are eligible only with some set of possible
lookaheads.  When run with `--report=lookahead', Bison specifies these
lookaheads:

     state 8
     
         exp  ->  exp . '+' exp  [$, '+', '-', '/']   (rule 1)
         exp  ->  exp '+' exp .  [$, '+', '-', '/']   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
     
         '*'         shift, and go to state 6
         '/'         shift, and go to state 7
     
         '/'         [reduce using rule 1 (exp)]
         $default    reduce using rule 1 (exp)

   The remaining states are similar:

     state 9
     
         exp  ->  exp . '+' exp   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp '-' exp .   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
     
         '*'    shift, and go to state 6
         '/'    shift, and go to state 7
     
         '/'    [reduce using rule 2 (exp)]
         $default       reduce using rule 2 (exp)
     
     state 10
     
         exp  ->  exp . '+' exp   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp '*' exp .   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
     
         '/'    shift, and go to state 7
     
         '/'    [reduce using rule 3 (exp)]
         $default       reduce using rule 3 (exp)
     
     state 11
     
         exp  ->  exp . '+' exp   (rule 1)
         exp  ->  exp . '-' exp   (rule 2)
         exp  ->  exp . '*' exp   (rule 3)
         exp  ->  exp . '/' exp   (rule 4)
         exp  ->  exp '/' exp .   (rule 4)
     
         '+'    shift, and go to state 4
         '-'    shift, and go to state 5
         '*'    shift, and go to state 6
         '/'    shift, and go to state 7
     
         '+'    [reduce using rule 4 (exp)]
         '-'    [reduce using rule 4 (exp)]
         '*'    [reduce using rule 4 (exp)]
         '/'    [reduce using rule 4 (exp)]
         $default       reduce using rule 4 (exp)

Observe that state 11 contains conflicts due to the lack of precedence
of `/' wrt `+', `-', and `*', but also because the associativity of `/'
is not specified.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: y.output format, Matthew P. Carter 98, 2002/06/21
- Re: y.output format, Hans Aberg, 2002/06/21
- Re: y.output format, Akim Demaille <=
Prev by Date: Re: Bison on Windows NT/2000?
Next by Date: Parser in multi-thread environment
Previous by thread: Re: y.output format
Next by thread: Memory Leak
Index(es):
- Date
- Thread