Re: glr: include the created header

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: glr: include the created header

From:	Joel E. Denny
Subject:	Re: glr: include the created header
Date:	Wed, 28 Jun 2006 07:07:09 -0400 (EDT)
On Wed, 28 Jun 2006, Akim Demaille wrote:

> I'm saying that we should strive to avoid the spirit that C brought
> to this world, and rather try to find something comparable to more
> modern and simpler trends.  Transposed to the programming language
> world, that would be C# or Java, granted, but I am not referring to
> them as target language, just the spirit.  Whether we target C or
> C# is irrelevant.

Ok, I think I'm beginning to see what you're meaning.  Sorry for my 
previous misunderstandings.

> In this case the user can simply put things before and things after
> the %union.  What if some day we add a means to define some other
> type, say %location-struct {...} defining YYLTYPE.  Would we want
> to also have different pre and post primitive to order them, and
> users should choose which to use?

If the order was originally dependent on %union, it's unclear to me what 
should happen.  If the order was originally declared by %*-header 
declarations, I don't see the need for any additional declarations just 
because we add %location-struct.

This is exactly why I don't like order dependence among the Bison 
declarations.  It doesn't scale.  It raises too many questions about what 
happens when we invent new declarations, when the user has multiple 
occurrences of a declaration, when the user omits a declaration 
altogether, etc.  I figured it was better just to stop guessing at what 
the user wants and let him declare it explicitly.  A simplifying 
assumption is that all Bison-generated header definitions are grouped 
together, and the user has no reason to break that group apart.

> My point is that whether there is a pre or post %union is purely
> an implementation detail.  Starting we the problem that in C we
> can forward declare structs and unions, but not typedefs.  The
> origin is also due to the fact that original Yacc introduced
> %union instead of asking the user to define YYSTYPE in some way
> or another in the prologue.

I agree that there is something appealing about letting the user define 
the union directly in C/C++.  Paul has talked about adding detection of 
<tag> usage so that Bison will still behave appropriately: it won't 
generate YYSTYPE, and I believe a few warnings are turned on by it. 
Moreover, not all languages have unions, so pushing the semantic type 
definition completely into the target language eliminates the need for a 
new semantic type declaration for each new language.

I suppose we could drop the %*-header declarations and we could add a 
%semantic-type (and maybe one day a %location-type) that contains literal 
code in the target language.  In it, the user would write a consolidated 
definition of the semantic type plus dependencies:

  %semantic-type {
    #include "type1.h"
    #include "type2.h"
    union {
      int field1;
      int field2;
    };
  }

For C and C++, Bison would place it in the header where appropriate, and 
Bison would know not to generate its own default YYSTYPE.  For any other 
target language, Bison would place YYSTYPE wherever necessary.  In that 
way, unlike the %*-header declarations, it's not peculiar to any 
particular target language.  It's generally useful.

In this case, the %{...%} code would go into the code file *only after* 
the header (where it could see all Bison-generated definitions above).  
That is, since %semantic-type would contain the semantic type 
dependencies, there would be no reason to divide the pre-prologue from the 
post-prologue.  That eliminates the subtle order dependence that bugged 
me.

The user who wants to add additional stuff to the Bison-generated header 
would be out of luck.  He could just write his own header that wraps the 
Bison-generated one.  That's easy enough.

%semantic-type couldn't be broken apart like %union simply because a union 
in C/C++ can't.  You'd have to define all fields up front instead.  For 
C/C++ programmers, that shouldn't be a surprise.

> I very much agree we should eliminate the differences between skeletons, 
> and I find your work very useful in this regard!  Something has to be 
> done, and it's great that you care to address this issue.  By no means 
> am I saying that the current (well, previous :) situation is satisfying.

Great!

> > Once a user has recognized a need for finer
> > control, do you think he'll find these declarations confusing or too
> > complex?
> 
> Yes.  I think they are too complex.  They are too low level IMHO.

Maybe so.

> Dressing the simple fact that order matters in the
> introduction of types and functions in C with new directives
> seems wrong to me: order matters, period.  Order matters inside,
> order matters outside.

Sometimes trying to use general principles to support an idea gets really 
distracting from what I really am after.  I probably should've stayed away 
from the general concept of order dependence.

I'll be more specific: I don't like that the grammar file position of the 
%union is important and that this importance is revealed to the user in 
subtle and ugly ways.  Yet somehow we seem to be trying to hide the 
concept of the pre- and post-prologues.  It's not working.  You're also 
showing distaste for %union, so I think we're somewhat in agreement on 
this point.

The %*-header solution was an attempt to surrender to C/C++ and make 
everything totally explicit for the user rather than trying to guess what 
he wants and confusing him.  You're saying there ought to be a better way 
than %*-header.  Maybe so.

> > > We already teach our dear users that they should prototype
> > > in the prologue the functions they use in the core grammar, but
> > > that are defined in the epilogue.  I have no plan to make this
> > > commutable.  This is C!
> > 
> > I wouldn't suggest that either.  But there's only one epilogue and it
> > always has the same position in the grammar file.
> 
> Well, personally I think it would be useful nevertheless!  To be
> able to group things together.  But I do not think that a new
> %epilogue {...} directive should do that, rather I expect to use
> some form of scoping, or to rely on %import.

No arguments there.  I just meant that the epilogue is easy to find and 
has no subtle dependencies on something seemingly (at first glance) 
unrelated like %union.

> > That's different than
> > an unlimited number of prologue declarations spread throughout the
> > declarations section such that, for each one, you have to hunt to find out
> > whether it's before or after the %union.  And if there is no %union, then
> > what happens?
> 
> The difference between before and after is then irrelevant!

It's relevant if you're writing a struct definition that requires YYLTYPE 
or the enum.  You want it to be after.  If you're writing code that 
typedef's YYSTYPE, then you have to #define YYSTYPE YYSTYPE before in 
order to prevent the Bison-generated typedef.  My point here is only that 
there are many cases to consider and document.  It's messy.

Thinking about it more, the %*-header declarations require you to know all 
this, so yeah I can see that they expose too much.  %semantic-type seems 
cleaner.

> But maybe I have not been clear on what's on my mind.  I am not
> pretending there should not be a %private {...} directive for
> matters that goes only in the *.c file.  Actually, I would
> also propose to introduce %public {...} as an alias for
> %{...%} for symmetry.

For consistency with Yacc, I'd say %private would be the alias for 
%{...%}.  If we have to have something like %{...%} for the header, I 
think it should not be a Yacc construct or we have confusing inconsistency 
again.

> > Why was the ability to concatenate several %union's together added?  If
> > the freedom of code organization that this allows is still appealing, then
> > why is the following not appealing?:
> > 
> >   %start-header { #include "type1.h" }
> >   %union { type1 field1; }
> >   %destructor { free1 ($$); } <type1>
> >   %printer { print1 ($$); } <type1>
> >   %type <type1> a b c
> > 
> >   %start-header { #include "type2.h" }
> >   %union { type2 field2; }
> >   %destructor { free2 ($$); } <type1>
> >   %printer { free2 ($$); } <type1>
> >   %type <type2> d e f

Hmm... so many typos.

> Your code demonstrates exactly my point: the order matters, and so
> much to your eyes that you would never have put the %start-header
> after the %union.

I agree completely.  However, you can see why %{...%} would not work in 
place of %start-header above.  The second %{...%} would be placed after 
the %union.

> I think that what you try to write here should be something offered
> by %import.  We need a form of global scoping, not a low-level
> form of issuing code in this or that section.
> 
> I would rewrite your code into something like
> 
> %module field1
> {
>   %public { #include "type1.h" }
>   %private { void free1 (type1); }
>   %union { type1 field1; }
>   %destructor { free1 ($$); } <type1>
>   %printer { print1 ($$); } <type1>
>   %type <type1> a b c
>   %%
>   a: b | c;
>   %%
>   void free1 (type1 t)
>   {
>     free (t);
>   };
> }
> 
> %module field2
> {
>   %public { #include "type1.h" }
>   %private { void free1 (type1); }
>   %union { type2 field2; }
>   %destructor { free2 ($$); } <type2>
>   %printer { free2 ($$); } <type2>
>   %type <type2> d e f
>   %%
>   d: e | f;
>   %%
>   void free2 (type2 t)
>   {
>     free (t);
>   };
> }

I am definitely interested in these ideas on modular grammars.

However, I'm becoming convinced that %union is bad.  The above would still 
look ok even if you: (1) move all the %public and %union code into a 
single global %semantic-type, and (2) convert %private to %{...%}.  Also, 
the distinction between %public and %private wouldn't be useful in Java I 
imagine, and I'm now thinking it's nice if we can use the same Bison 
constructs in all languages.

> As an aside, I would like to introduce %type {...} in addition
> to %union + %type <>, because I think that the name of the fields
> is a private matter that the user should not know about.

The scanner needs to know the name of the union field... unless you're 
restricting this to some future scannerless mode.

> In
> particular because it might cause troublesome conflicts between
> modules.  (BTW, it is so tempting that in the examples above we
> have used type1 in %destructor, %printer, and %type!)

Whoops.

> So:
> 
> %module field1
> {
>   %public { typedef int type1; }
>   %private { void free1 (type1); }
>   %type {type1} a b c
>   %destructor { free1 ($$);  } a b c
>   %printer    { print1 ($$); } a b c
>   %%
>   a: b | c;
>   %%
>   void free1 (type1 t)
>   {
>     free (t);
>   };
> }

Are you going to require a one-to-one correspondence between modules and 
fields?  If so, is this really universally the best way to organize a 
grammar?

Joel
[Prev in Thread]
Current Thread
[Next in Thread]
Re: glr: include the created header, (continued)
Prev by Date: Re: glr: include the created header
Next by Date: Re: glr: include the created header
Previous by thread: Re: glr: include the created header
Next by thread: Re: glr: include the created header
Index(es):
- Date
- Thread