[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reductions during Bison error handling

From: Paul Hilfinger
Subject: Re: Reductions during Bison error handling
Date: Mon, 13 May 2002 19:44:14 -0700

HA > So GLR is just LR with non-deterministic parsing, but it should behave in
HA > each thread as LR during error recovery. Thus simpler than LALR(1).

Well, almost.  It's a bit sticky in GLR, alas, (whether you using
LALR(1) or LR(1) tables), because you EXPECT lots of erroneous
"branches", even with correct input.  Specifically, when you come to 
something requiring LALR(2), or LR(1), you split off parsers, one of
which you know will fail.  I don't know if anyone has looked into
error recovery in GLR (i.e., when ALL parses fail).  It might not be
optimal to "blame" the longest-surviving parser branch for the error.

HA > Yes, this is what LR does; LALR does not do that because of its
HA > (intentional) table compression flaw. -- I suspect one should use LR for
HA > good error recovery.

Again, Bison has, in effect, TWO information-losing table compression
tricks vs. pure LR:

1. States differing only in lookahead symbols are merged (the LALR

2. Erroneous lookaheads are not distinguished from lookaheads for the 
   most common reduction (an optimization that could apply equally
   well to LR(1) and LALR(1) parsers, as described in ASU, pp 244-245).
   This is the one I've been talking about here; it is orthogonal to 
   the compression effected by the LALR construction.

 > One idea that I have is to allow errors be put on a hierarchy (a directed
 > tree, like in C++ derived classes). The error recovery token would only
 > pick up it and derived errors (lower down in the tree hierarchy). Then one
 > can throw more accurate error when detected, providing a more exact error
 > recovery.

To be honest, however bad panic-mode syntactic error recovery might be
in Bison, in practice it doesn't seem so terrible *given how it is now
used.*   Back in the good old days when I had to wait quite a while
for a compilation, good error recovery was extremely useful, since it
allowed me to get more errors per run, and therefore significantly
affected productivity.  The way I work now, and with the speed of
modern computers, the "compile to the first syntax error, fix, and repeat" 
technique has become increasingly viable over the years.  That's why I
must admit that it's not easy for me to get terribly excited over
major innovation in this area, and I've just been looking to clean up
a few fringes.

If you do want to implement your idea or something more ambitious, do
be sure to check the literature (Corbett's dissertation has quite a
few pointers, plus caveats about some of the apparently attractive
techniques).  It's surprising what can and has been done in this area.
One of my favorites: GNU Ada (GNAT) allows forward declarations of
nested subprograms, which look like this

           procedure Glorp (X: Integer);

and full subprogram definitions of these nested subprograms, like this:

           procedure Glorp (X: Integer) is   -- YYY
              Local_Var: Integer;

           -- XXX

A common error is to write ";" instead of "is" on line -- YYY.  It
then LOOKS like a forward declaration followed by more variable
definitions, followed by the body of the surrounding package or
subprogram.  The compiler does not discover a problem until it finds
that -- XXX is not the end of the file.  The GNAT compiler actually
reports this error by saying that you wrote ";" rather than "is" on
line YYY, even though the syntax error isn't detected until line 
-- XXX.  There are automatic techniques that can perform feats like
this---pretty neat.

Paul Hilfinger

reply via email to

[Prev in Thread] Current Thread [Next in Thread]