[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: glr: include the created header

From: Joel E. Denny
Subject: Re: glr: include the created header
Date: Wed, 5 Jul 2006 17:45:51 -0400 (EDT)

On Wed, 5 Jul 2006, Akim Demaille wrote:

> >>> "Joel" == Joel E Denny <address@hidden> writes:
> Sorry for the lag,

No problem.  I've been occupied as well.

> So we should not have to rely
> on explicit directive names to specify that order: let the order
> itself be the specification of the order into which things are to be
> emitted.

That would be nice... except Bison provides capabilities that (I think) 
make that impossible.  (Sorry if some of the next few paragraphs are 
repeating earlier discussion, but I'm thinking it's important to 

For example, multiple %union's (which, by the way, I *am* liking more and 
more since it's really nice for separation of concerns).  If I declare a 
%union followed by a %{...%} followed by a %union, how can Bison possibly 
generate the target code in that order?  Bison has to concatenate the 
%union's together into a single YYSTYPE.  So, how does Bison determine 
which %{...%} are placed before and which are placed after the YYSTYPE?  
Should the user have to learn that it's the *first* %union that divides 
the %{...%}'s?  That's how it works now, but I think that's exposing the 
mess in a subtle way.  For example, the following looks right, but it 
doesn't compile:

  %{ #include "type1.h" %}
  %union { type1 field1; }
  %destructor { free1 ($$); } <field1>
  %printer { print1 ($$); } <field1>
  %type <field1> a b c

  %{ #include "type2.h" %}
  %union { type2 field2; }
  %destructor { free2 ($$); } <field2>
  %printer { print2 ($$); } <field2>
  %type <field2> d e f

In your %module/%type proposal, I believe this problem still exists within 
each %module.  I assume it would just be %type rather than %union that 
divides the prologue.  And now the user has to figure out the relationship 
between inter- and intra-%module declaration order.

Another example is the C/C++ header.  To simplify the discussion, assume 
the user declares only one %union.  Should Bison insert every %{...%} from 
before the %union into the header, and should it insert every %{...%} from 
after the %union only into the code file?  What if there is no %union?  
Does the user now lose the ability to put code in his header, or does he 
lose the ability to put code only in his code file?  How does either loss 
make any sense?  It seems quite subtle.  Then add multiple %union's and it 
gets even more subtle.  Add %module... yikes.

I early on proposed a %header{...%} instead of the 4 %*-header directives.  
(This is similar to your %private and %public.)  The problem then is, what 
if the user declares %header{...%} followed by %{...%} followed by 
%header{...%}?  Bison can't possibly generate the code in that order since 
%header{...%} goes in the header but %{...%} only goes in the code file.  
I tried adding the restriction that you're just not allowed to declare in 
that order, but I found that even more confusing... and we still have the 
multiple %union and %module problems.

In general, declaration order != generation order.  Looking at the grammar 
file at two declarations containing passages of C code, those passages may 
appear in the header/code files with other C code inserted between them 
even though that other C code did not appear between them in the grammar 
file.  At the same time, some of the declarations that appeared in the 
grammar file between declarations containing passages of C code may not 
(in any sense) appear between those same passages in the generated 
header/code files.

In light of that behavior, my view is that the language of the Bison 
grammar file is not C and we shouldn't try to convince the user that it 
works like C because that's a misleading claim.  Instead, the language is 
a series of declarations some of which contain C code, and we should 
define what each of those declarations do in the most flexible and clear 
way possible.

>  > Regardless, if I want to block the typdedef for YYLTYPE, I have to
>  > know this.
> That's because Bison doesn't know this either.  It is dead wrong to
> continue to rely on #defines to define this kind of stuff.

I don't like #define's either.  Something better for YYLTYPE would be 

> With
> genuine directives Bison will know enough to know where to put, or not
> to put, YYLTYPE etc.

I agree that something like %location-type would help, but I'm not 
convinced it's enough given the list of other problems above.

> In your original proposal there was no YYSTYPE, and I though you meant
> to have bison parse this to find the different parts.

Sorry about that.  It was a typo.  I meant for it to be literal C code.

>  >> She should continue to declare as usually, and let *us* deal with
>  >> the mess underneath.
>  > I'd like that.  The %union-dependent approach exposes the mess in subtle 
>  > ways (one example is above).  The %*-header approach at least makes the 
>  > mess explicit.  Another solution may be to have %semantic-type and 
>  > %location-type and have Bison generate the location type before the 
>  > semantic type.
> We're converging :)

Yes, but we still seem to disagree on the issue of position-independent 
Bison declarations.  I was thinking that, unlike %union, %semantic-type 
and %location-type would have no influence on the placement of %{...%}, 
%private, %public, or whatever.  That is, I was thinking the generated 
order would be documented (for the user) as always something like:

  enum yytokentype
  %location-type code
  #ifndef YYLTYPE
    typedef struct YYLTYPE {
    } YYLTYPE;
  %semantic-type code
  #ifndef YYSTYPE
    typedef union YYSTYPE {
    } YYSTYPE;
  %public code
  %private or %{...%} code

where the header is between the dashed lines, and the code file contains 
all of the above.

As I demonstrated in an earlier post, I was thinking the user could define 
location type and semantic type dependencies in %location-type and 

The user could also define YYLTYPE and YYSTYPE themselves in 
%location-type and %semantic-type if he wished.  Of course, in the above 
proposal the user would have to then #define YYSTYPE YYSTYPE or #define 
YYLTYPE YYLTYPE.  That's ugly.  To solve this, we could introduce a syntax 
like the following:

  %semantic-type "SemanticType" {
    #include "type1.h"
    #include "type2.h"
    typedef union SemanticType {
      type1 field1;
      type2 field2;
    } SemanticType;

The `"SemanticType"' would tell Bison to simply typedef SemanticType 
YYSTYPE.  If the user omitted `"SemanticType"', Bison would generate its 
own definition for YYSTYPE, and the user should only put semantic type 
dependencies in the braced code.  In case `"SemanticType"' is actually 
something simple like `"char *"', the user could omit the braced code.

By the way, %public and %private sound like they're named after OO 
concepts, which are similar but not exactly the same.  How about just 
%header {...} and %{...%}?  Languages like Java will have no use for 
%header/%private, right?  %private might fool Java users, and %header 
should be more explicitly useless to Java.  With either name, we should 
probably disallow it when it's useless.

So, the above proposal seems to avoid the subtleties of order-dependent 
declarations.  It seems to be higher-level and more 
target-language-independent than %*-header.  It seems compatible with your 
modular grammar proposal and your `%type {char *} a b c' (where there is 
no %union to divide the prologue, right?).  It loses a little flexibility 
relative to %*-header in that the user can't place code at the top of his 
code file or at the top of his header file (unless he wants to abuse 
%location-type), but maybe that flexibility isn't necessary now.

>  >> I've not sort out the details yet, that way ahead, but I'm thinking
>  >> about something like
>  >> 
>  >> return yy_symbol (token-type, token-value, token-location);
>  > What is token-type?
> The int value returned by yylex, the kind of token that was returned
> (BRACED_CODE, INT, etc.).

In order to strongly type the token-value parameter in languages like C, 
maybe we should create a family of functions something like:

  return yy_symbol_BRACED_CODE (token-value, token-location);
  return yy_symbol_INT (token-value, token-location);

Then again, maybe I'm misunderstanding.  Are you thinking yy_symbol is a 
function that returns a struct/class containing the three parameters 
(type, value, and location)?  That's what I was thinking.

By the way, I like the idea of equating a semantic type with a type in the 
target language rather than with a union field name.  I also see how 
modules might conflict on a field name, but types don't have that problem.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]