help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Finding out when a token is consumed


From: Frank Heckenbach
Subject: Re: Finding out when a token is consumed
Date: Mon, 12 May 2003 04:58:13 +0200

Hans Aberg wrote:

> If that works, that seems to be one of the simplest method. It would be
> easy for you to make your own skeleton file with its own macro right before
> the action switch statement; then a supply that file also in a
> distribution. Then you can turn off the location feature.
> 
> In this variation, what you ask for is essentially a statement
>     %preaction <code>
> with <code executed right before any other action. It would probably not be
> very difficult to supply Bison with such a feature, if somebody is willing
> to supply it. (Essentially what is needed is putting <code> in a M4 macro,
> and then insert that macro in the skeleton files right before the action
> switch statement.)

Perhaps a macro that takes 3 arguments (similar to YYLLOC_DEFAULT),
the number of symbols and a reference to $$ and $1, so it can do
something with the semantic values. (E.g., if someone prefers
$$ = $N instead of $$ = $1 for the default action, this would be
easy to do there.)

> It would be nice with a feature that tells whether the lookahead is needed
> or not, so one can implement sure context switches. While at tweaking the
> skeleton files, there is a segment:
> 
> yybackup:
> 
>   // Try to make a decision without lookahead:
>   n_ = pact_[state_];
>   if (n_ == pact_ninf_)
>     goto yydefault;
> 
>   // Read a lookahead token.
>   if (lookahead_ == empty_) {
>     YYCDEBUG << "Reading a token: ";
>     lex_();
>   }
> 
> Here you can clearly determine whether a lookahead is used or not. So if
> you can find a good tweak for your purposes, it might be interesting for
> you to report to Bug Bison how you did it so that it can be used as a
> feature in Bison:
> 
> I think it would be of general interest to get a better handling of this
> lookahead problem.

Unless I'm missing something, `yychar == YYEMPTY' should do this,
doesn't it? I haven't tried this, except for the example in my
original mail, but from the description in the manual (Action
Features) I think it should work.

However, that's only true for parsers with at most one token of
look-ahead. OTOH, GRL can take an arbitrary amount of look-ahead
(from the prespective of the parser it's only one look-ahead token,
but to the semantic actions it appears like more because they're
delayed). So I guess this method generally won't work there. Imagine
a situation like this:

- The parser splits, reads many tokens from the lexer, but doesn't
  perform any actions yet.

- Finally, the parsers unite (or all but one die), and the saved
  actions are run.

- When the first action is run, yychar may be empty or not, but this
  doesn't tell us much, even if we know the state of the lexer
  before the split and currently. Suppose n tokens were lexed in the
  meantime. AFAICS, all we can tell is if yychar is empty, the
  current action may use 0..n tokens and if yychar is not empty, it
  may use 0..(n-1) tokens. That's very little information.

So I think in general we need a more powerful approach to deal with
look-ahead. If we know how many terminals the current rule uses,
this would help. We know the number of terminals+nonterminals used
(yylen, which is also the last argument to YYLLOC_DEFAULT), but I
don't know a way to get the number of terminals only. In principle
this infomration is available -- the number of the rule used is
known, and each rule has a constant number of terminals in it. But I
don't know if bison stores this number (actually, I suppose it
doesn't, so it would take further code to add it). Alternatively,
one could loop over the top-most n symbols on the stack and check
for each one whether it's a terminal. Again, in principle this
should be possible, but I also doubt whether this information is
readily available. Perhaps someone more familiar with bison's
internals can answer this ...

Failing this, I suppose the only reliable method that works with a
GLR parser (as I said, I'm considering using one, so that's relevant
for me) is to store some extra information (in my case about the
directives, but in general at least whether it's a terminal or
nonterminal) for each symbol. This could be either with the semantic
values (but it would have to go into each branch of the %union,
probably not nice), or the location, or some new data structure
(which would again require more changes to the skeleton).

So now I've again arrived at an abuse of the location feature --
this time not primarily for the YYLLOC_DEFAULT macro, but for the
location stack. (And even if one wants to use locations in the
regular way, it seems easier to stuff some extra information into
them than into YYSTYPE because YYLTYPE isn't a union).

To sum it up, in case it was a little confusing: For an LALR(1)
parser (and probably many other ones), something like %preaction
should work, checking for look-ahead using `yychar == YYEMPTY'.

But with a GLR parser it won't work, and we either need additional
information from bison (which I don't know how to get) or we have to
attach extra data to each token, which might be done most easily in
YYLTYPE. (And then, if we (ab)use YYLTYPE for this, we can just as
well use YYLLOC_DEFAULT for the corresponding code I suppose. This
would then be like my `t4.y' example.)

Frank

-- 
Frank Heckenbach, address@hidden
http://fjf.gnu.de/
GnuPG and PGP keys: http://fjf.gnu.de/plan (7977168E)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]