bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %union foo bar baz and others { ... }


From: Paul Eggert
Subject: Re: %union foo bar baz and others { ... }
Date: 22 Jan 2003 15:13:14 -0800

Akim Demaille <address@hidden> writes:

> I'm curious: I would really like to know why you think the "dirty
> hack" is dirty than the current solution.

The dirty hack places undesirable constraints on the parser, since it
requires that actions be executed in a particular order, with the
order of actions' side effects being quite important.  For example, I
don't offhand see how it would work correctly if we switch to a GLR
parser: I suppose it might work, but it might not (particularly in
error situations).

Another way to put it is that the dirty hack is a path from the parser
to the lexer, where the parser communicates context to the lexer, and
the lexer behaves differently depending on context supplied by the
parser.  This leads to well-known problems; it is similar to the
problems with typedef'd identifiers in C.  It is undesirable for that
reason.

In the Bison 1.875 approach, there is no such communication.  The
lexer is responsible for keeping track of the context.  The parser
doesn't need to worry about supplying the context to the lexer.  And
the context does not need to be recorded in a static variable, so this
doesn't hurt the reentrancy of the lexer or parser.  These are all
technical advantages.

Of course, these advantages do not come without cost, since it does
complicate the lexer to scan the larger "token" that includes both
(say) "%union" and the braced code that comes after the "%union".

> And again, to me, the "dirty hack" is just a path to parsing the
> actions elsewhere, i.e., it is the best approximation for the time
> being, the code that leaves the scanners and parsers as close as
> possible to what it will be in the future.

To understand this argument better I'd like to know more about how you
plan to deal with scanning actions in the future.  A naive approach
would be to scan the braced code twice: once in scan-gram.l, which
simply returns a string containing the action's contents; and once in
a routine invoked by the semantic analyzer that rescans the string
looking for $$, $1, etc., and substituting as it goes.

Unfortunately this naive approach would have other problems, as it
would require that we must write two scanners for actions, and each
scanner would have to know the intricacies of C lexical analysis,
including such brain damaged features as UCNs, multibyte characters,
and backslash-newline.  (The problems would be different with non-C
languages, of course, but I'm just trying to see how the C solution
would work.)

So I guess you must be thinking of another approach, in which
scan-gram.l escapes $ and @ in a safe way, such that the semantic
analyzer can simply walk through the string looking for a single
escape character without having to understand the lexical rules of C.
A downside of this approach is that the braced code will still need to
be scanned twice, with twice the CPU and memory costs; but I guess
efficiency is not that big of a deal these days.  However, another
downside is that both scans will have to worry about multibyte
characters, which will be a hassle; or we'll have to convert back and
forth between multibyte and wide characters, which will be another
hassle.

But perhaps I'm off on the wrong track here....


> Your code, on the contrary, to my eyes, immensely complicates the
> scanner.

It adds 23 lines to the scanner, if you count both the installed patch in 
<http://mail.gnu.org/archive/html/bison-patches/2002-12/msg00015.html>
(which grows it by 31 lines) and the proposed patch in
<http://mail.gnu.org/archive/html/bison-patches/2003-01/msg00056.html>
(which shrinks it by 8 lines).  I agree that this is a complication,
but I'm not sure I agree that it is an immense one; to my eyes it is
only slightly more complicated than the "dirty hack" that it replaces,
and it does have the technical advantages mentioned above.

I suspect that any attempt to go to the "future approach" that I
guessed at above will require many more changes to the parser and
lexer than the 20 to 30 lines at issue here.  The switch to the
"future approach" will be dozens or hundreds of times more
complicated.  I don't see why this current minor change would have
much effect on the feasibility of the big "future approach" change.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]