[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please, support easy AST generation

From: Frank Heckenbach
Subject: Re: Please, support easy AST generation
Date: Sun, 09 Dec 2018 07:47:18 +0100

Askar Safin wrote:

> Hi. Often the only thing I want to do in Bison is just generate
> AST and nothing else. Unfortunately, in this case code becomes
> very repetitive. For example, this is Bison input I used to
> generate AST for small JavaScript subset:
> https://zerobin.net/?b9af68c9aa7a31a9#JB4wZseYTq9aKfZOwwZrDqBNCqlEZRj/+DM9bKdgtKU=
> . Please, support some feature, which eliminates the need to write
> such repetitive code. Such code can be easily generated from
> grammar alone

Indeed, I had discussed something similar with Akim Demaille
privately some time ago; he also mentioned it on this list a few
weeks ago (I guess I'm one of the "some people" he refers to, but
unfortunately I was too busy then to get involved):

: Having a means to ask Bison to generate actions given some form of
: a generic pattern is a different topic.  It makes a lot of sense,
: and some people have discussed about this on various occasions
: here.  That would help to bind to some abstract factory for ASTs
: for instance, without forcing a specific AST format.

So let me restate and expand upon my suggestion here, which is,
of course, just a rough sketch. This refers mainly to the C++ output
as it makes heavy use of templates and overloading. Maybe something
similar can be done in Java etc., but I'll leave that to the
respective experts. In C, I think this will only work in a less
general way which requires more effort by the end-user.

Actually, this would not only mostly automatically generate actions
to generate ASTs and the like, but also some normal actions in my
current grammars that don't primarily build ASTs, but contain
actions that build "mini sub-ASTs", i.e. structures that are later
acted upon in other actions, e.g. parameter lists while parsing a
function declaration or call. I guess such actions might be common
in many grammars.

AFAIK, Bison's current behaviour applied to any rule can be summed
up like this (I hope I'm not missing important details):

1. If a user-specified action is given, output this action and
   nothing else; otherwise:

2. if $$ has no declared type, do nothing; otherwise:

3. if there are no RHS symbols, default-initialize $$ (when Bison
   knows how to, esp. when using variants); otherwise:

4. output the default action "$$ = $1;" (possibly with auto-move),
   warning if the declared types of $$ and types are different
   (which may still result in a successful compilation if the types
   are compatible, or a compiler-error down the line if not).

   As a side note, though I suggest to introduce this default action
   into C++, I'm not sure if it's actually useful to apply it (in
   any language) when there's more than one RHS argument (or at
   least, more than ne typed one, more about this below). But if
   that's required by POSIX, we'd have no choice, at least in C.

My suggestion is basically to insert another step (optional, of
course) to auto-generate actions just based on the types involved
before 3., or even before 2. -- the latter could be useful to
generate actions that consume parsed data (e.g. to compile a
function or to store a declaration in an interpreter) rather than
build structures such as ASTs, e.g.:

1.5: if an auto-action for the types of $$ and $n can be generated,
     do so; otherwise: ...

So how can such an auto-action be specified?
A rather general way may use a form similar to %printer, e.g.:

  %auto-action { build_foo ($$, $^); } <foo>;

Then any rule where $$ is of type foo and which has no explicit
user-action would use build_foo automatically.

  %auto-action { build_default ($$, $^); } <*>;

This would apply to any type of $$ not mentioned explicitly.

I'm using a new special token "$^" here (the name is only exemplary,
in analogy with make) to insert the RHS arguments as a
comma-separated list (again with auto-move if enabled). More
precisely, only those RHS arguments with a declared type, since
(a) Bison wouldn't know how to insert arguments without declared
types anyway (esp. with variants), and (b) automatically omitting
them allows further simplification in a lot of common (IMHO) cases,

  expr: '(' expr ')';

would then auto-generate:

  build_auto ($$, $2);

without having to mention $2 explicitly (assuming the '(' and ')'
tokens have no type), or in a hypothetical language:

  decl: "function" id '(' parlist ')' attrs ';';

would auto-generate:

  build_auto ($$, $2, $4, $6);

discarding all the syntactic fluff automatically.

An even more general way would be to discriminate on the types of $$
and $n, like this:

  %auto-action { build_foo ($$, $1, $2); } <foo, bar, baz>;

This would avoid "$^" because the number of RHS arguments is
specified explicitly. However, (a) I fear this might be a lot of
effort to implement (whereas the discrimination on $$ only can
hopefully reuse the %printer infrastructure), (b) might actually
require more effort by the user to implement more such directives,
and (c) might mainly be of interest for C grammars.

In C++ (and perhaps Java etc.), OTOH, I'd rather let the compiler do
the selection; I might only ever use:

  %auto-action { build_default ($$, $^); } <*>;

to defer it all to a single overloaded and templated function.

So then how could this function be implemented? (Note, the following
is just a collection of ideas, not tested and probably with a number
of more or less hairy issues, but just to convey the idea. I'm
assuming auto-move throughout.)

First a very general version that constructs $$ from the arguments
if possible. This already covers case 3. above (if T has a default
constructor), as well as case 4. with the side note (if T has a copy
or move constructor), but also actions that must just pack their
arguments in some simple structure (e.g. std::pair, std::struct or a
user-defined structure) for later use, or that build some object
using any of its constructors.

  // probably need enable_if<is_constructible ...> here ...
  template<typename T, typename... R>
  void build_default (T& t, R&&... r)
    t = T { std::forward<R> (r)... };

Another overload could automatically build smart pointers (here,
unique_ptr, but likewise for shared_ptr or similar) when applicable:

  template<typename T, typename... R>
  void build_default (std::unique_ptr<T>& t, R&&... r)
    t = std::make_unique (std::forward<R> (r)...);

When building containers (e.g. parameter lists), the rules that
build the start (i.e., a container with 0 or 1 elements, depending
on the grammar) should already be covered by the general version
above using the default constructor or single-element list
initialization. For rules that extend containers, we could have:

  // maybe enable_if<is_container<T>> ...
  template<typename T, typename... R>
  void build_default (T& t, T&& c, R&&... r)
    t = std::move (c);
    t.emplace_back (std::forward<R> (r)...);

Such basic and exemplary overlads could be shipped with Bison,
either in a generated header (optionally), or a static header
distributed with Bison (though I know Bison doesn't install any
headers so far, so this would be something new), or at least in the
documentation, so users would have the tricky parts taken care of

Auto-generating most actions (esp. for AST builders) might then
become as simple as adding one "<*>" action, and declaring all
symbols with suitable types (which is good style anyway), so the
overloads can apply automatically.

User code could add more application-specific overloads where
needed. E.g. in some of my grammars I have a number of rules like
this, one for each precedence group of operators:

  expr: expr or_op  expr { $$ = build_binary_expr ($1, $2, $3); };
  expr: expr and_op expr { $$ = build_binary_expr ($1, $2, $3); };

Then I could instead write just once:

  void build_default (Exp& t, Exp&& a, Op&& b, Exp&& c)
    t = build_binary_expr (std::move (a), std::move (b), std::move (c));

In any case, user code keeps all flexibility. If the general
overloads are not wanted for some rule, one can add a more specific
overload for the types involved, and if that won't do, one can just
write an explicit action like before which will disable all
auto-generation for this rule.

To cover case 2. above, we'd just need a special syntax for "no $$"
("void" won't do with the syntax above, since "void &" is not
valid). E.g., just dropping $$ from the parameter list in this case
(and not covering this case with "<*>"), so e.g. to auto-generate
rules to print everything that's discarded due to lack of an action,
something like:

  %auto-action { do_print ($^); } <>;

:  The thing is...  We need more hands on Bison.

I know, and unfortunately I'll still be busy for the next few weeks,
but I'm optimistic I can find some time to work on Bison next year.

This would be a feature of interest (and immediate use) to me, so I
think I could work on this; esp. the "build_default" implementations
(which is basically normal, though heavily templated, C++
programming), including testing them with my grammars.

For the work required in Bison proper, I might need some help,
though I hope it might be as simple as adding one new directive like


reply via email to

[Prev in Thread] Current Thread [Next in Thread]