Re: Finding out when a token is consumed

help-bison
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Finding out when a token is consumed

From:	David Fletcher
Subject:	Re: Finding out when a token is consumed
Date:	Fri, 9 May 2003 20:17:03 +0200
>>>>> "HA" == Hans Aberg <address@hidden> writes:

HA> At 02:26 +0200 2003/05/09, Frank Heckenbach wrote:

HA> We still do not get to know say whether this is a language given
HA> to you, or whether the examples you have are just your own
HA> experimenting with designing a grammar.

When it comes down to it, does it really matter?

HA> The difference here is that these are some localize, well defined,
HA> features which can be handled by special tricks. You are asking
HA> for something else, fuzzy directives that can be thrown in
HA> anywhere.

>From what I can tell, these don't appear to be "fuzzy."  I admit I
haven't been following this too closely, but I don't see the evidence
for your claim, Hans.

>> In a theoretical sense, it might be cleaner to put all such things
>> in a proper grammar (even if it causes some conflicts and might
>> even require a more general and less efficient parsing method). In
>> practice, it's probably better to use an efficient LALR(1) parser
>> for the most part, and do the ugly bits outside of the parser. My
>> question was only about how to interface it to the parser in the
>> least painful way ...

HA> It is difficult to make such interfacing, especially when there is
HA> no way to identify the interfacing segments.

I understand what you're saying, but I don't completely agree.  Here
are a few approaches that I've taken in the past, and all have their
strong and weak points, but all of the approaches have worked for me.
Choosing a particular approach depends on a variety of factors:

        - You can modify your grammar to handle special constructs in
          a direct fashion.  As you have noted, this can "bloat" the
          grammar if you're not careful.  Sometimes, I've created
          special grammar rules (leaf rules) that only process tokens
          arriving from the lexer, and the rest of the grammar doesn't
          deal with tokens arriving from the lexer at all.  With care,
          this approach can simplify situations like you describe.

        - You can modify your lexer to handle the language directives,
          but doing this completely in the lexer can be tricky because
          you have to know how they tie in vis a vis the grammar.  It
          sounds like the directives get "pushed up" the syntax tree?
          Even so, without knowing all of the details it appears that
          you might be able to perform some special processing to pull
          this off.  I'd have to look at your examples more closely...

        - Sometimes I've created intermediate code that sits between
          the lexer and parser.  That is, yacc calls MySpecialLex()
          instead of yylex(), and MySpecialLex() will call yylex()
          when needed.  But, MySpecialLex() maintains its own state,
          might do special processing at certain times.  Doing this
          can keep the grammar much simpler.  The result is still
          efficient in operation, and easier to support than the
          alternative (modifying the grammar).  I've used this
          approach a number of times, and the deciding factor comes
          down to the coding complexity and resultant support
          cost(s).

        - It may make sense to create an intermediate representation
          from your parser, with the directives attached to this
          representation.  A post-processing step can then "apply"
          these directives in the correct fashion.  I've done this to
          good effect in the past.  If your language is complex enough,
          this may be worth considering.

        - Finally, you might consider altering the parser to insert
          special code to do what you need.  This is... uhm... painful
          with bison and getting harder all the time.

          byacc might be simpler to work with.  If you don't mind
          switching to other tools and grammars, there is a plethora
          of parser generators.  Some are quite interesting (e.g.,
          Elkhound) and well-proven (e.g., ANTLR).  Perhaps a more
          flexible recursive-descent parser might suffice?  For
          example, the latest g++ parser is no longer a yacc-based
          parser, but a hand-written r.d. parser.  I know that there
          are r.d. parser generators available, and some appear to be
          quite good.  But, it may be more work to switch from the
          (quite dated) yacc syntax to something else, so modifying
          the yacc grammar might be the way to go.  It sounds like
          you've already done this using the "approved interface," but
          this interface appears to be lacking.

HA> If you want to say implement operators with dynamic precedents,
HA> then this can be done by writing Bison rules that make the save
HA> the expression components in a stack, and then write a special
HA> program that sorts out how they should be combined. So here one is
HA> saving the semantics and sorts it out later. If you want to make
HA> use of lexer context switches, you must make sure they do not
HA> clash with parser lookaheads.

This sounds rather complicated and prone to failure.  I realize that
certain languages are implemented this way, my experience is that this
leads to hard-to-maintain code.  Most people don't take this into
account, as maintenance hovers around 70% of the cost for a software
system.  If the code is to survive, whether open source or not, it
needs to be easy to maintain.

HA> ...suggest that you are designing your own language where
HA> directives can be thrown in just anywhere. This is a poor language
HA> design, because you do not get semantics attached to parsing tree.

It doesn't sound to me that the directives are thrown in willy nilly.
Instead, it sounds like this gentleman is looking for more powerful
ways to handle directives that are potentially well-defined, so that
these directives can be processed once instead of myriad places in the
grammar.  He already has a solution that works, it just uses a macro
that wasn't intended for this purpose.  If this macro works, why not
use it?  Perhaps the bison maintainers should consider expanding on
the macro?

>From where I sit, this is a question of the best way to code for
simplicity, reduced maintenance, and elegance more than anything
else.

Just MHO.

--
David Fletcher                          Tuscany Design Automation, Inc.
address@hidden                     5875 S. Danube Circle
303.690.4309                            Aurora, CO 80015-3169 USA
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Finding out when a token is consumed, (continued)
- Re: Finding out when a token is consumed, David Fletcher <=
Prev by Date: Re: yylval corruption solved
Next by Date: Re: Finding out when a token is consumed
Previous by thread: Re: include files in bison parser
Next by thread: Error Recovery
Index(es):
- Date
- Thread