[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: generate the default semantic action

From: Rici Lake
Subject: Re: RFC: generate the default semantic action
Date: Mon, 15 Oct 2018 17:15:19 -0500

(Apologies for not replying properly to the mail thread; I wasn't
previously subscribed to this list.)

On Sun, 14 Oct 2018, Akim Demaille wrote:

>    The simplest seems to be actually generating the default semantic
>    action (in all languages/skeletons).  This makes the pre-action (that
>    sets $$ to $1) useless.  But...  maybe some users depend on this, in
>    spite of the comments that clearly warn againt this.  So let's not
>    turn this off just yet.

Sorry, what is the "this" which comments clearly warn against? There is a
clear warning in the bison manual (and other places) against relying on $$
to be initialised to anything meaningful in a production with an empty
right-hand-side, but I don't recall having seen any warning against using
the default action, and I believe it is quite common to rely on it.

Please note that the default action is *not* the same as { $$ = $1; }. As
the Bison manual states:

> ... you may also provide a struct rather that a union, which may be handy
if you want to track information for every symbol (such as preceding

It's not entirely clear how this feature is intended to be used, but it
certainly can be used, and I know that there are bison projects which rely
on it. Here's a small and uninspiring example of how it can be used:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct AugmentedUnion AugmentedUnion;

struct AugmentedUnion {
 const char* context;
 union {
   int num;
   char* str;

void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
int yylex(void);

%define api.value.type { AugmentedUnion }

%token <str> WORD <num> NUMBER
%type <str> word
%type <num> number expr


prog: %empty
    | prog word        { printf("!%s %s\n", $<context>2, $2); }
    | prog expr word   { printf("!%s %d %s\n", $<context>2, $2, $3);

expr: number
    | expr '+' number  { $$ = $1 + $3; }
    | '(' expr ')'     { $$ = $2; }

word:   WORD
number: NUMBER


int yylex(void) {
  for (;;) {
    int ch = getchar();
    if (ch == EOF) return 0;
    else if (ch == '!')   { scanf("%ms", &yylval.context); continue; }
    else if (isspace(ch)) continue;
    else if (isalpha(ch)) { ungetc(ch, stdin); scanf("%ms", &yylval.str);
return WORD; }
    else if (isdigit(ch)) { ungetc(ch, stdin); scanf("%d", &yylval.num);
return NUMBER; }
    else return ch;

int main(int argc, char** argv) { return yyparse(); }

That's all pretty pointless, but it shows how the extra data field can be
bubbled up through the parse using the default action (which copies the
entire semantic value, not just the typed field.) This is particularly
apparent in the production for parenthesized exprs, in which the context
returned for the production is the one copied from the "untyped" '(' token,
not the one from the expr. (The assumption is that this was intended, of
course. In a real example, it certainly might be.) The fact that it works
even with "untyped" symbols is an important feature; it means that
auxiliary information can be provided and passed around without requiring a
rather tedious set of %type declarations for all of the grammar's keyword
and character tokens.

This patch won't affect programs which rely on this feature, as far as I
can see. But I wanted to highlight the issue because the road that follows
this patch might inadvertently affect existing programs which count on the
default action to be a copy of the *entire* semantic value.


A couple of notes about this usage:

This is not the only way to accomplish this goal. Another mechanism I've
tried is to put the shared part of the semantic value into the location
object. That works, too, but it requires explicit modification of
YY_DEFAULT_ACTION because the default location action only knows about the
specified fields of yylloc. Also, it seems to break encapsulation to make
the location object semantic.

Let's suppose that this is a useful feature (based on the fact that it is
being used). Although it works after a fashion here, there is no way of
expressing the action "copy the entire semantic value of the second
symbol". If there were a way of writing that, it could be used as the
default action instead of `{ $$ = $1 }`. One syntax I was thinking about is
the untag `<>`, as in `$<>$ = $<>1;`. It's a bit punctuation-heavy but the
meaning seems clear. It would make it possible to write `$<context>$` as
`$<>$.context`, which you might or might not think is more readable. It
would also allow a symbol to be defined with `%type <> sym` if the normal
use case for that symbol was to reference more than one tag. So maybe there
is something there.

Finally, bison is not entirely happy with the use of "untyped" symbols;
under some circumstances it will complain that the default action causes an
untyped symbol to be given a semantic value. I understand the reason for
this complaint, but I'm not entirely sympathetic. Various times I've wanted
to be able to just tell bison "use <foo> as a tag for any token which I
haven't explicitly assigned a type"; in practice, I have a script which
generates %token declarations but I wouldn't turn down an explicit
declaration, such as %define api.value.default-token-type <foo>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]