[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A polymorphic YYSTYPE for C++ (instead of the %union)

From: Michiel De Wilde
Subject: Re: A polymorphic YYSTYPE for C++ (instead of the %union)
Date: Sun, 17 Jun 2007 00:10:33 +0200

Hi Hans and the Bison maintainers,

> [I like a solution in the next release of Bison]
> so we finally get rid of manual object destruction.
It has been in the wait for years, so do not hold your breath.

All the more reason to work towards a solution! It is okay to have
different opinions on implementation details (polymorphy through C++
templates vs. through inheritance), but let's not start a fight.

My original patch was written because it is currently impossible in
Bison to have these two desirable features /together/ in C++ parsers:
* the automatic destruction of discarded semantic value objects
 (also in actions, which %destructor cannot provide)
* type consistency of semantic value objects guaranteed at compile time
 by the %union mechanism (and %token <...> and %type <...> declarations)
 (To be concrete: when using "%type <a_ptr> mynonterminal", Bison
  guarantees across ALL rules that the semantic value of a "mynonterminal"
  has the type of the a_ptr field of the union. If you use your own
  polymorphic type instead of the %union, this consistency is not enforced.)

The replies of Paul Eggert and Hans Aberg indicate that something more
generic is desired than only support for boost::variant.

Here is a "generic" analysis of Bison support required/useful for any
"polymorphic-type-other-than-a-union" solution.
(1) The definition of YYSTYPE/semantic_type as a polymorphic type
   (usable like a union, but please not a real C union with its limitations)
   --> current support: a real C union{} or any user-defined type
(2) A way to express the type of the semantic value of tokens and nonterminals.
   --> current support: the %token and %type clause indicating a
       C union field between angular brackets
(3) Automatic correct type selection (or enforcement) in actions
   of objects of the polymorphic semantic_type.
   --> current support: field selection of the union: yyval.type_field
(4) Cleanup of dicarded (non)terminals
   --> current support: %destructor which takes care of cleanup in all cases
       (but only those cases) that cannot be handled by actions
(5) Defined behavior for rules without actions
   --> current support: $$=$1, undefined behavior when there is no $1.

Now I'll address whether the current support of (1) to (5) above is sufficient.

(1) semantic_type definition
When you don't want to use the %union construct, Bison is already
fully flexible in letting the user define any semantic value type. So
current support for (1) suffices.

(2) (non)terminal type expression
If you want to put actual type names in the %token/%type clauses as
opposed to union field members, you cannot use types with a ">" in
their definition. The typename currently has to match the regular
expression "[^\0\n>]+". A workaround is possible: e.g., "typedef
std::list<int> IntList;" and use IntList instead of list<int> further
on (though not very practical).

(3) Automatic correct type selection.
The current expression "yyval.type_field" limits the semantic_type to
a member-by-member enumeration. This is a far-reaching limitation and
can currently not be overridden. For a boost::variant you would need
"boost::get<type>(yyval)" instead of "yyval.type".

(4) Cleanup of discarded (non)terminals
The generally accepted memory management paradigm in C++ is that the
cleanup of all heap-allocated data related to an object should be
managed by the destructor of the object's class. So (4) is a non-issue
when you use a polymorphic semantic_type with proper destructor

(5) Defined behavior for rules without actions.
In a strongly typed context $$=$1 doesn't make sense when the types of
$$ and $1 are different. Not that the "default action" is currently
/always/ executed in reductions before control is passed to the right
user-defined action. A more appropriate name would be "initialization"
instead of "default action". One should be able to provide some other
In the concrete boost::variant solution, the natural initialization is
to assign a default-constructed object of the right type to the
polymorphic object behind $$. This means that a different
initialization is to be done depending on the type of $$. Such a
type-dependent construction cannot be provided by m4 alone but
requires extra C code as well.


These are the proposed patch descriptions, in order of descending
importance. My boost::variant patch set of the original submission
already implements patches 1-3 (but specifically for boost::variant;
not generic).
Please share your ideas on these patch proposals. If I know my work
will not be thrown away (an intention to commit it to cvs), I'm
willing to write the patches.

Patch 1

Allow other type selection mechanisms than yyval.field_name.

For the c and c++ skeletons at least, the field selection is only
present in three macros: b4_lhs_value, b4_rhs_value and
b4_symbol_actions. So this is a very small effort (with high gains).

Implementation: Something in the line of providing an overridable
b4_typed_lvalue([YYVAL],[TYPE]) with a default definition of
($1[]m4_ifval([$2],[.$2])) , and rewriting b4_lhs_value, b4_rhs_value
and b4_symbol_actions to make use of this macro.

Patch 2

Allow user-defined type-dependent initialization of nonterminals at
the result of a reduction.

Implementation: Allow the user to define
b4_typed_initialization([YYVAL],[TYPE]) . When this macro is defined,
the default $$=$1 code is replaced by a switch statement on the rule
number with grouped cases for each result type. Each case calls
b4_typed_initialization(yyval,TYPE) for the right TYPE. The default
case (meaning no type defined) calls the macro as
b4_typed_initialization(yyval). This switch statement requires an
extra muscle in src/output.c .

Patch 3

Avoid copying of yyval and yylval when pushing them onto the semantic
stack. Use std::swap instead. This avoids unnecessary deep copies,
e.g., in case the semantic type is a STL container (such as
std::string or std::list<int>).
The template function std::swap (defined in <algorithm>) is fully
standard in C++ and operates correctly for /any/ type without extra
This patch IS really necessary to avoid mandatory reference counting
on pointers. If people are uncomfortable with this, we can always hide
it behind a macro "m4_ifdef(b4_swap,...,...)".


Change every occurrence of
 yysemantic_stack_.push (yyval);
 yysemantic_stack_.push (semantic_type());
Same goes for yylval.

Patch 4

Adapt src/scan-gram.l so it accepts type names containing balanced
occurrences of angular brackets.

Implementation: by keeping track of the "nesting level" of angular brackets

Docs: all of this will need to be documented.

That's all for today. As always, constructive comments are very welcome.

Kind regards,


2007/6/16, Hans Aberg <address@hidden>:
On 16 Jun 2007, at 00:25, Michiel De Wilde wrote:

>> There is no change needed: I have used a standard C++ polymorphic
>> class hierarchy for years
> I'm very interested. Could you provide an example grammar?

It does not have anything with the grammar as such to do: I use it to
create a polymorphic class hierarchy, and then the grammar is used to
create object for this class hierarchy. It is a theorem prover.

There is a "class object" which contains a polymorphic pointer to a
virtual C++ hierarchy, also with a reference count acting as a GC.

> How do you
> use the safety-providing automatic type selection feature of Bison in
> this case (the equivalent of automatically choosing the right field of
> the %union)?

I do not use it, but I experimented once with a variation of %union
that only triggers the typing mechanism, which I called %typed. The
Bison typing mechanism merely selects a filed in the union based on
the type name, so the idea is to replace this with a macro, as in C++
one should use static_cast or dynamic_cast.

>> A typed replacement of %union: Then I think the problem is with
>> the default actions, which no longer can be collected to a single
>> entry in the parser "switch" statement.
> Very true. I've addressed this by outputting a muscle (in
> src/output.c) providing C++ code that is executed right before a
> reduction action.

This was originally out of the switch statement, but I think was
moved into it (don't remember details), in order to avoid double
execution of the default rule, which is OK in C, but not in C++,
which may have special copy-constructors (like auto_ptr).

> It initializes the yyval variant to a
> default-constructed object of the right result type.

And a problem with variant is that it does not work with recursove
data types. In addition, if one is writing a polymorphic hierarchy,
it would be strange having to convert back and forth to variants in
the grammar actions. Therefore I am for a more general support.

> By the way, I've not used the "%union" construct for the definition of
> the different types as I saw no way to get rid of the braces in the
> generated muscle.

There is no way to get rid of those braces, but to rewrite the code...

> Personally I think that the "%define variant"
> approach is a fairly clean solution.

...which Akim recently has done for %define.

> In any case, thank you for the quick reply to my submission. I
> sincerely hope that this or another solution will make it to the next
> release of Bison (so we finally get rid of manual object destruction).
> Further comments/test results are very welcome!

It has been in the wait for years, so do not hold your breath.

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]