Re: Finding out when a token is consumed

help-bison
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Finding out when a token is consumed

From:	Frank Heckenbach
Subject:	Re: Finding out when a token is consumed
Date:	Sat, 10 May 2003 04:51:22 +0200
Hans Aberg wrote:

> At 07:06 -0600 2003/05/09, David Fletcher wrote:
>
> > Here
> > are a few approaches that I've taken in the past, and all have their
> > strong and weak points, but all of the approaches have worked for me.
> > Choosing a particular approach depends on a variety of factors:
> >
> >     - You can modify your grammar to handle special constructs in
> >       a direct fashion.  As you have noted, this can "bloat" the
> >       grammar if you're not careful.  Sometimes, I've created
> >       special grammar rules (leaf rules) that only process tokens
> >       arriving from the lexer, and the rest of the grammar doesn't
> >       deal with tokens arriving from the lexer at all.  With care,
> >       this approach can simplify situations like you describe.

I thought of something like this, but I couldn't figure out a way to
achieve this without affecting most grammar rules. I.e., with these
leaf rules, what would be their LHS? A new nonterminal I suppose.
But then I'd have to allow for this nonterminal within my regular
grammar rules (which would be many places). Or am I missing
something?

> >     - You can modify your lexer to handle the language directives,
> >       but doing this completely in the lexer can be tricky because
> >       you have to know how they tie in vis a vis the grammar.  It
> >       sounds like the directives get "pushed up" the syntax tree?
> >       Even so, without knowing all of the details it appears that
> >       you might be able to perform some special processing to pull
> >       this off.  I'd have to look at your examples more closely...

Like many parsers, I don't actually build a parse tree as a
structure in memory, but I handle many semantic actions right after
they're parsed, and this is (in various ways) influenced by the
directives. So, conceptually, I think one could say the directives
are pushed up the syntax tree. IIUIC, this seems similar to what
Hans suggested in the other mail ("stamp your lexemes with types in
the lexer already") ...

> >     - Sometimes I've created intermediate code that sits between
> >       the lexer and parser.  That is, yacc calls MySpecialLex()
> >       instead of yylex(), and MySpecialLex() will call yylex()
> >       when needed.  But, MySpecialLex() maintains its own state,
> >       might do special processing at certain times.  Doing this
> >       can keep the grammar much simpler.  The result is still
> >       efficient in operation, and easier to support than the
> >       alternative (modifying the grammar).  I've used this
> >       approach a number of times, and the deciding factor comes
> >       down to the coding complexity and resultant support
> >       cost(s).

I have experience with using intermediate code (for other reasons),
but here, I think, it would fail for the lookahead problem (which
cannot be solved completely outside the parser AFAICS).

> >       byacc might be simpler to work with.  If you don't mind
> >       switching to other tools and grammars, there is a plethora
> >       of parser generators.  Some are quite interesting (e.g.,
> >       Elkhound) and well-proven (e.g., ANTLR).  Perhaps a more
> >       flexible recursive-descent parser might suffice?

Probably not. AFAIK, RD is weaker than LALR(1), and my grammar is
complex enough so I need LALR(1) (in fact, I'm even thinking about
using GLR in the future to solve some difficult parts).

Apart from this, bison has worked very well for me over the years,
and I wouldn't like to replace it, especially for such a peripheral
issue ...

> > It doesn't sound to me that the directives are thrown in willy nilly.
> > Instead, it sounds like this gentleman is looking for more powerful
> > ways to handle directives that are potentially well-defined, so that
> > these directives can be processed once instead of myriad places in the
> > grammar.  He already has a solution that works, it just uses a macro
> > that wasn't intended for this purpose.  If this macro works, why not
> > use it?  Perhaps the bison maintainers should consider expanding on
> > the macro?

BTW, I never suggested or demanded that the bison maintainers change
anything. I realize that my needs are probably quite special and
might not justify a change to bison code.

> We are told that there is no specific given grammar for those extra
> directives, but they can be thrown in just anywhere. So how could those
> methods work then?

I hope I can clear up this confusion at least. If directives can be
thrown in anywhere (between tokens), there still is a grammar: Just
change the original grammar to accept directives before each
terminal. This grammar is still context-free. However, when I tried
this in the obvious way, I got R/R conflicts (even with the simple
example), and I didn't see an easy way to remove them. I'm not even
sure if the grammar is still LALR(1) or even LR(1) or whatever.

So there *is* a grammar, but it might be quite unpractical (because
it might not allow for efficient parsing and isn't written in the
most natural form).

> >From where I sit, this is a question of the best way to code for
> >simplicity, reduced maintenance, and elegance more than anything
> >else.
> 
> You mean, if one first only gets a working version? Personally, I prefer
> accurate, structured software that works over code written for simplicity,
> reduced maintenance, and elegance more than anything else with a lot of
> bugs in it.

Of course, my first priority is that the code is correct. That
rules out any methods that have bugs (such as ignoring the
look-ahead problem).

Given this, I agree that there's sometimes a conflict between
structured and elegant/maintainable code. I usually prefer the
former actually, but when there's an "O(n) decision" (here, a
localized abuse of the location macro in one place vs. extra code
necessary in each action), I choose the more maintainable code.

Frank

-- 
Frank Heckenbach, address@hidden
http://fjf.gnu.de/
GnuPG and PGP keys: http://fjf.gnu.de/plan (7977168E)
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Finding out when a token is consumed, (continued)
Prev by Date: Re: Finding out when a token is consumed
Next by Date: Re: Finding out when a token is consumed
Previous by thread: Re: Finding out when a token is consumed
Next by thread: Re: Finding out when a token is consumed
Index(es):
- Date
- Thread