bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multistart: free choice of the start symbol


From: Akim Demaille
Subject: Re: multistart: free choice of the start symbol
Date: Sat, 21 Nov 2020 15:17:49 +0100

Hi all,

I have let this issue on the side for many weeks now.  I had
to work on other issues, but now I feel it's time for me to
work on it again.

I would very much appreciate more opinions on this topic.

I have not yet started to explore Rici's suggestion to use some
kind of structure to exchange with the parser, so it's too soon
for me to report some mileage.

However, I have update some of the things I said.

> Le 29 sept. 2020 à 19:20, Akim Demaille <akim.demaille@gmail.com> a écrit :
>> Le 27 sept. 2020 à 20:46, Rici Lake <ricilake@gmail.com> a écrit :
>> Many parser generators do have the option to parse from various roots. One
>> interesting case is ANTLR, which provides methods for parsing from *every*
>> non-terminal (with names generated from the non-terminal).

> This feature, "start *", would generate quite larger automata.
> 
> In the case of Bison's own grammar, I get 450 states (that only x3,
> I was expecting more)

Let me state this again: we "only" go from 169 states to 428, much
less than I had anticipated.  So it's completely doable to introduce
support for "%start *" for instance.

> *and* additional conflicts (because Bison is
> still using LALR for its grammar, so you can still have "subautomata"
> that share states).

This is wrong.  There are no new conflicts.  It's kind of a luck though,
in LALR I was really expecting to get new conflicts, but I didn't get
any.  The reason I had conflicts was that I actually had duplicates in
my list of start symbols...  (A check is dearly needed, indeed).

Bison with the equivalent of "%start *" passes the whole test suite.

> What I did not anticipate though, is that it crashes when generating
> canonical LR on that grammar.  However, I not not yet investigated
> the impact of my changes in IELR and canonical LR, so that a TODO.

This has been fixed, multiple start symbols and IELR/canonical-LR work
well together.



>> Of course, in a C code generator, you most certainly wouldn't want to
>> generate dozens (or hundreds) of unused interfaces,

I don't see this as a problem.  Especially in the cases you started
from: debugging and/or toying.  In production, "%start *" hardly makes
sense, IMHO.


>> so this kind of feature
>> would be better implemented by a general call which took a non-terminal
>> enumerator as an argument. But that would require that the returned value
>> type be the same regardless of non-terminal, which effectively reduces to
>> the YYSTYPE union (or whatever it happens to be).

As I had mentioned in another reply, all my symbol-specific functions
actually sit on top of a generic one:

typedef struct
{
  YYSTYPE yyvalue;
  int yynerrs;
} yy_parse_impl_t;

int yy_parse_impl (int yychar, yy_parse_impl_t *yyimpl);

I originally meant it to be a private implementation detail, but Rici
convinced me it should be exposed to the user.  It plays the role
of the common function Rici suggested that we used (the one which does
not return a struct).  It also works independently of wether
api.value.type is union or not.

Cheers!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]