Re: [PATCH] doc: cex documentation

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] doc: cex documentation

From:	Akim Demaille
Subject:	Re: [PATCH] doc: cex documentation
Date:	Thu, 2 Jul 2020 20:46:40 +0200
Hi Vincent,

> Le 30 juin 2020 à 06:47, Akim Demaille <akim@lrde.epita.fr> a écrit :
> 
> Hi Vincent,
> 
>> Le 28 juin 2020 à 21:30, Vincent Imbimbo <vmi6@cornell.edu> a écrit :
>> 
>> Hey Akim,
>> I think I found a good place for the cex documentation in bison.texi, but it 
>> might still be a bit awkward. Tell me if you have any ideas for changes.
> 
> Thanks a lot for this!  I made a few changes (e.g., the name is 
> -Wcounterexamples, plural, moving -Wcex as foremost feature of 3.7, wrap at 
> 76 columns, remove the duplicate example in NEWS which is B&W, sort 
> references, etc.)
> 
> Here's how I installed your commit.  I'll proceed from there.  Thanks!

I believe your work deserve a lot more space than what you did.  Here is my 
shot.  I would be very happy to get feedback, especially from you Vincent.

Cheers!


commit cf1f69b98723fab92c6b2c7be5c678e7cd0a5a80
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Thu Jul 2 20:35:33 2020 +0200

    cex: give more details about -Wcex and -rcex
    
    * data/bison-default.css: Cobalt does not seem to be supported.
    * doc/bison.texi (Counterexamples): A new section.
    (Understanding): Show the counterexamples as it shows in the report:
    with its items.
    (Bison Options): Document -Wcex and -rcex.

diff --git a/data/bison-default.css b/data/bison-default.css
index 2652e03f..5248c294 100644
--- a/data/bison-default.css
+++ b/data/bison-default.css
@@ -44,7 +44,7 @@
 .cex-0   { color: yellow; }
 .cex-1   { color: green; }
 .cex-2   { color: blue; }
-.cex-3   { color: cobalt; }
+.cex-3   { color: purple; }
 .cex-4   { color: violet; }
 .cex-5   { color: orange; }
 .cex-6   { color: brown; }
diff --git a/doc/bison.texi b/doc/bison.texi
index b45bbca9..c4a60ba9 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -53,11 +53,11 @@
 \gdef\colorBlue{%
  \setcolor{\rgbBlue}%
 }
-
-\gdef\rgbWarning{0.50 0 0.50}
-\gdef\colorWarning{%
- \setcolor{\rgbWarning}%
+\gdef\rgbPurple{0.50 0 0.50}
+\gdef\colorPurple{%
+ \setcolor{\rgbPurple}%
 }
+
 \gdef\rgbError{0.80 0 0}
 \gdef\colorError{%
  \setcolor{\rgbError}%
@@ -84,10 +84,10 @@
 @macro colorBlue
 @inlineraw{html, <b style="color:blue">}
 @end macro
-
-@macro colorWarning
+@macro colorPurple
 @inlineraw{html, <b style="color:darkviolet">}
 @end macro
+
 @macro colorError
 @inlineraw{html, <b style="color:red">}
 @end macro
@@ -115,8 +115,12 @@
 @colorBlue{}\text\@colorOff{}
 @end macro
 
+@macro purple{text}
+@colorPurple{}\text\@colorOff{}
+@end macro
+
 @macro dwarning{text}
-@colorWarning{}\text\@colorOff{}
+@purple{\text\}
 @end macro
 
 @macro derror{text}
@@ -9824,7 +9828,158 @@ and understand the parser run-time traces 
(@pxref{Tracing}).
 @node Counterexamples
 @section Generation of Counterexamples
 
-@fixme
+Solving conflicts is probably the most delicated part of the design of an LR
+parser, as demonstrated by the number of sections devoted to them in this
+very documentation.  To solve a conflict, one must understand it: when does
+it occur?  Is it because of a flaw in the grammar?  Is it rather because
+LR(1) cannot cope with this grammar?
+
+On difficulty is that conflicts occur in the @emph{automaton}, and it can be
+tricky to related them to issues in the @emph{grammar} itself.  With
+experience and patience, analysis the detailed description of the automaton
+(@pxref{Understanding}) allows to find example strings that reach these 
conflicts.
+
+That task is made much easier thanks to the generation of counterexamples,
+initially developed by Chinawat Isradisaikul and Andrew Myers
+@pcite{Isradisaikul 2015}.
+
+As a first example, see the example grammar of @ref{Shift/Reduce}, which
+features on shift/reduce conflict:
+
+@example
+$ @kbd{bison if-then-else.y}
+if-then-else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
+if-then-else.y: warning: rerun with option '-Wcounterexamples' to generate 
conflict counterexamples [-Wother]
+@end example
+
+@noindent
+Let's rerun @command{bison} with the option
+@option{-Wcex}/@option{-Wcounterexamples}@inlinefmt{info, (the following
+output is actually in color)}:
+
+@ifhtml
+@example
+Shift/reduce conflict on token "else":
+@group
+  Example            @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} 
@red{•} @yellow{"else" stmt}
+  First derivation   @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} 
@blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} 
@yellow{"else" stmt ]}
+  Example            @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} 
@red{•} @blue{"else" stmt}
+  Second derivation  @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} 
@blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} 
@green{]} @yellow{]}
+@end group
+@end example
+@end ifhtml
+@ifnothtml
+@smallexample
+Shift/reduce conflict on token "else":
+@group
+  Example
+    @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} 
@yellow{"else" stmt}
+  First derivation
+    @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt 
::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} @yellow{"else" stmt ]}
+  Example
+    @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} 
@blue{"else" stmt}
+  Second derivation
+    @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt 
::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} @green{]} @yellow{]}
+@end group
+@end smallexample
+@end ifnothtml
+
+This shows two different derivations for one single expression.  That
+demonstrates that the grammar is ambiguous.
+
+@sp 1
+
+As a more delicate example, consider the example grammar of
+@ref{Reduce/Reduce}, which features a reduce/reduce conflict:
+
+@example
+%%
+sequence:
+  %empty
+| maybeword
+| sequence "word"
+;
+maybeword:
+  %empty
+| "word"
+;
+@end example
+
+Bison generates the following counterexamples:
+
+@example
+$ @kbd{bison -Wcex sequence.y}
+sequence.y: @dwarning{warning}: 1 shift/reduce conflict 
[@dwarning{-Wconflicts-sr}]
+sequence.y: @dwarning{warning}: 2 reduce/reduce conflicts 
[@dwarning{-Wconflicts-rr}]
+Shift/reduce conflict on token "word":
+  Example              @red{•} @yellow{"word"}
+  First derivation     @yellow{sequence ::=[} @green{sequence ::=[} @red{•} 
@green{]} @yellow{"word" ]}
+  Example              @red{•} @green{"word"}
+  Second derivation    @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} 
@green{"word" ]} @yellow{]}
+
+Reduce/reduce conflict on tokens $end, "word":
+  Example              @red{•}
+  First derivation     @yellow{sequence ::=[} @red{•} @yellow{]}
+  Example              @red{•}
+  Second derivation    @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} 
@green{]} @yellow{]}
+
+Shift/reduce conflict on token "word":
+  Example              @red{•} @yellow{"word"}
+  First derivation     @yellow{sequence ::=[} @green{sequence ::=[} 
@blue{maybeword ::=[} @red{•} @blue{]} @green{]} @yellow{"word" ]}
+  Example              @red{•} @green{"word"}
+  Second derivation    @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} 
@green{"word" ]} @yellow{]}
+
+sequence.y:8.3-45: @dwarning{warning}: rule useless in parser due to conflicts 
[@dwarning{-Wother}]
+    8 |   @dwarning{%empty    @{ printf ("empty maybeword\n"); @}}
+      |   @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
+@end example
+
+Each of these three conflicts, again, prove that the grammar is ambiguous.
+For instance, the second conflict (the reduce/reduce one) shows that the
+grammar accept the empty input in two different ways.
+
+@sp 1
+
+Sometimes, the search will not find an example that can be derived in two
+ways.  In these cases, counterexample generation will provide two examples
+that are the same up until the dot.  Most notably, this will happen when
+your grammar requires a stronger parser (more lookahead, LR instead of
+LALR).  The following example isn't LR(1):
+
+@example
+%token ID
+%%
+s: a ID
+a: expr
+expr: %empty | expr ID ','
+@end example
+
+@command{bison} reports:
+
+@smallexample
+Shift/reduce conflict on token ID:
+  First example        @blue{expr} @red{•} @green{ID} @yellow{$end}
+  First derivation     @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} 
@red{•} @blue{]} @green{ID ]} @yellow{$end ]}
+  Second example       @purple{expr} @red{•} @purple{ID ','} @green{ID} 
@yellow{$end}
+  Second derivation    @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[} 
@purple{expr ::=[ expr} @red{•} @purple{ID ',' ]} @blue{]} @green{ID ]} 
@yellow{$end ]}
+@end smallexample
+
+This conflict is caused by the parser not having enough information to know
+the difference between these two examples.  The parser would need an
+additional lookahead token to know whether or not a comma follows the
+@code{ID} after @code{expr}.  These types of conflicts tend to be more
+difficult to fix, and usually need a rework of the grammar.  In this case,
+it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
+ID}.
+
+Alternatively, you might also want to consider using a GLR parser
+(@pxref{GLR Parsers}).
+
+@sp 1
+
+On occasions, it is useful to look at counterexamples @emph{in situ}: with
+the automaton report (@xref{Understanding}, in particular @ref{state-8,,
+State 8}).
 
 @node Understanding
 @section Understanding Your Parser
@@ -9889,62 +10044,6 @@ calc.y: @dwarning{warning}: 7 shift/reduce conflicts 
[@dwarning{-Wconflicts-sr}]
 calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate 
conflict counterexamples [@dwarning{-Wother}]
 @end smallexample
 
-When given @option{-Wcounterexamples}, @command{bison} will run a search for
-strings in your grammar that better demonstrate you
-conflicts. Counterexample generation was initially developed by Chinawat
-Isradisaikul and Andrew Myers @pcite{Isradisaikul 2015}.  For
-@file{calc.y}, the first printed example is:
-
-@example
-Shift/reduce conflict on token '/':
-  Example              @green{exp '+' exp} @red{•} @yellow{'/' exp}
-  First derivation     @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} 
@green{]} @yellow{'/' exp ]}
-  Example              @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
-  Second derivation    @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} 
@green{'/' exp ]} @yellow{]}
-@end example
-
-This shows two separate derivations in the grammar for the same @code{exp}:
-@samp{e1 + e2 / e3}.  The derivations show how your rules would parse the
-given example. Here, the first derivation completes a reduction when seeing
-@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
-derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
-an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
-directives would fix this conflict.
-
-Sometimes, the search will not find an example that can be derived in two
-ways.  In these cases, counterexample generation will provide two examples
-that are the same up until the dot.  Most notably, this will happen when
-your grammar requires a stronger parser (more lookahead, LR instead of
-LALR).  The following example isn't LR(1):
-
-@example
-%token ID
-%%
-s: a ID
-a: expr
-expr: %empty | expr ID ','
-@end example
-
-@command{bison} reports:
-
-@smallexample
-Shift/reduce conflict on token ID:
-  First example        @blue{expr} @red{•} @green{ID} @yellow{$end}
-  First derivation     @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} 
@red{•} @blue{]} @green{ID ]} @yellow{$end ]}
-  Second example       @blue{expr} @red{•} @blue{ID ','} @green{ID} 
@yellow{$end}
-  Second derivation    @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr 
::=[ expr} @red{•} @blue{ID ',' ] ]} @green{ID ]} @yellow{$end ]}
-@end smallexample
-
-This conflict is caused by the parser not having enough information to know
-the difference between these two examples.  The parser would need an
-additional lookahead token to know whether or not a comma follows the
-@code{ID} after @code{expr}.  These types of conflicts tend to be more
-difficult to fix, and usually need a rework of the grammar.  In this case,
-it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
-ID}.
-
-Counterexamples can also be written to a file with @option{--report=cex}.
-
 Going back to the calc example, when given @option{--report=state},
 in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
 with contents detailed below.  The order of the output and the exact
@@ -10176,6 +10275,7 @@ State 7
     exp  go to state 11
 @end example
 
+@anchor{state-8}
 As was announced in beginning of the report, @samp{State 8 conflicts:
 1 shift/reduce}:
 
@@ -10249,6 +10349,27 @@ Conflict between rule 1 and token '-' resolved as 
reduce (%left '-').
 Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
 @end example
 
+When given @option{--report=counterexamples}, @command{bison} will generate
+counterexamples within the report, augmented with the corresponding items
+(@pxref{Counterexamples}).
+
+@example
+Shift/reduce conflict on token '/':
+    1 exp: exp '+' exp •
+    4 exp: exp • '/' exp
+  Example              @green{exp '+' exp} @red{•} @yellow{'/' exp}
+  First derivation     @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} 
@green{]} @yellow{'/' exp ]}
+  Example              @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
+  Second derivation    @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} 
@green{'/' exp ]} @yellow{]}
+@end example
+
+This shows two separate derivations in the grammar for the same @code{exp}:
+@samp{e1 + e2 / e3}.  The derivations show how your rules would parse the
+given example. Here, the first derivation completes a reduction when seeing
+@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
+derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
+an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
+directives would fix this conflict.
 
 The remaining states are similar:
 
@@ -11096,6 +11217,13 @@ unexpected number of conflicts is an error, and an 
expected number of
 conflicts is not reported, so @option{-W} and @option{--warning} then have
 no effect on the conflict report.
 
+@item counterexamples
+@itemx cex
+Provide counterexamples for conflicts.  @xref{Counterexamples}.
+Counterexamples take time to compute.  The option @option{-Wcex} should be
+used by the developer when working on the grammar; it hardly makes sense to
+use it in a CI.
+
 @item dangling-alias
 Report string literals that are not bound to a token symbol.
 
@@ -11470,6 +11598,13 @@ each rule's lookahead set.
 Implies @code{state}.  Explain how conflicts were solved thanks to
 precedence and associativity directives.
 
+@item counterexamples
+@itemx cex
+Look for counterexamples for the conflicts.  @xref{Counterexamples}.
+Counterexamples take time to compute.  The option @option{-rcex} should be
+used by the developer when working on the grammar; it hardly makes sense to
+use it in a CI.
+
 @item all
 Enable all the items.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [PATCH] doc: cex documentation, Akim Demaille <=
- Re: [PATCH] doc: cex documentation, Akim Demaille, 2020/07/03
Prev by Date: news, todo: update
Next by Date: Re: [PATCH] doc: cex documentation
Previous by thread: news, todo: update
Next by thread: Re: [PATCH] doc: cex documentation
Index(es):
- Date
- Thread