help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd parser behaviour


From: Tim Van Holder
Subject: Re: Odd parser behaviour
Date: Mon, 25 Sep 2006 08:49:13 +0200
User-agent: Thunderbird 1.5.0.7 (Windows/20060909)

Heiko Wundram wrote:
> Am Freitag, 22. September 2006 15:26 schrieb Tim Van Holder:
>> I would expect this to accept any input starting with A(B|C?D?E)+,
>> but in practice, it only accepts input starting with A(B|CDE)+,
>> because for the xyzzy nterm, it always tries to reduce the opt_C.
>>
>> >From the .output:
>>
>> state 1
>>
>>     2 bar: A . plus_xyzzy
>>     3 plus_xyzzy: . xyzzy
>>     4           | . plus_xyzzy xyzzy
>>     5 xyzzy: . B
>>     6      | . opt_C opt_D E
>>     7 opt_C: .  [D, E]
>>     8      | . C
>>
>>     B  shift, and go to state 4
>>     C  shift, and go to state 5
>>
>>     $default  reduce using rule 7 (opt_C)
>>
>>     plus_xyzzy  go to state 6
>>     xyzzy       go to state 7
>>     opt_C       go to state 8
> 
> But, all of this looks okay? On B or C, a shift occurs, on any other 
> lookahead, $default, in this case [ADE], a reduce (on opt_C) takes place, 
> which should be the behaviour you expect. In case there's another A in the 

Except that it actually takes the entire "opt_C opt_D E" path - if an A
is seen, it will reduce opt_C and opt_D then error out on the "E" rule,
while it should have reduced up to the "bar" nterm, and only try to
error out then (except that foo's action would have YYACCEPTed the
grammar before it reaches the point at which an error should be
produced).

> input, rule 12 should find that for you, and abort there; otherwise, an E 
> will be matched and reduced to a xyzzy by rule 13. A B is reduced to xyzzy in 
> rule 4 immediately.

I get the same result for "ABA" as I get for "ABF" - it never reduces to
foo.  I guess it all boils dozn to the fact that I expect bison to
reduce a smuch as possible before starting any error processing, so I
would expect it to reduce foo when I get "AB<any token other than
C/D/E>" instead of the current I-need-an-E-at-least-or-I-error-out-here
behaviour.
I can get the behaviour I want by adjusting the grammar:

  foo: bar { YYACCEPT; } any;

  ...<same old same old>

  any : A | B | C | D | E | F;

but that also gives me 2 S/R and 2 R/R conflicts (understandably); for a
real-world grammar that needs this system (as part of a preprocessor
that needs to handle certain undelimited statements without needing to
parse the surrounding code), the number of conflicts would be huge and
unacceptable.
I guess the solution would be for bison to consider everything past the
declared grammar as the "wild west" - so that any rule that touches the
end of the %start rule has a $default rule that reduces the rule it's
in, so that it will not produce a syntax error until after the %start
rule has been processed.  Even for a "normal" parser this has beneficial
consequences; if actions are used to build an AST, the current behaviour
would mean that an error past the end of a successful parse would likely
prevent the %start rule's action (typically finalizing the AST) from
being executed, possibly making the entire parse pointless.

> I haven't had the time to write a lexer for this grammar to test for the 
> behaviour you mention when you use bison's lalr-engine, but the 
> bison-generated LR-sets seem to represent A(B|C?D?E)+ properly, at least 
> AFAICT from a quick glance, and the tables produced by bison when used in 
> PyBison work properly (PyBison implements the standard bison LALR-parser in 
> Python, using the tabular output of bison).






reply via email to

[Prev in Thread] Current Thread [Next in Thread]