help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multiple, different, %merge of a token sequence


From: Derek M Jones
Subject: Re: Multiple, different, %merge of a token sequence
Date: Wed, 03 May 2006 01:19:04 +0100
User-agent: Thunderbird 1.5 (Windows/20051201)

Joel,

Thanks for the prompt reply.

I've now given your code a much closer look. This behavior is another manifestation of a complex Bison GLR problem that I've known about for a while and am still working on. Unfortunately, I don't believe I've previously posted about it. It touches on many areas of the GLR implementation, and I just haven't found the time yet to properly describe what I know.

Ok.  It would be useful to add a warning to the documentation that
this kind of thing sometimes happens (at least for the moment).

Unfortunately, it seems like the most obvious solution will worsen the parser's time complexity. I'm exploring a more sophisticated solution, but I'm afraid it'll be a while before I return to it.

I can parse + other not very complicated stuff all the .c files in
the gcc/postgresql/openafs/openmotif/netscape (original C version)
source trees in under 30 cpu seconds.  Having this increased to 60
seconds would not be a problem.

If your %merge functions eliminate subtrees, it appears that you are right: this bug actually does increase subtree sharing for your example input because it delays elimination until higher up.

For me this is not really a problem because it appears to be rare and I
can ignore the leaked memory caused by not doing the free's. A bigger
potential problem is that the high level merge has to do some analysis
to handle the fact that the expected disambiguation did not happen in
a lower level production.  Since my project involves measuring C
source characteristics a small amount of 'random' error can be handled
(ie, I don't worry about doing a correct disambiguation).  See
www.knosof.co.uk/cbook/cbook.html

However, node sharing is a general problem in GLR parsers that can't always be solved by writing such %merge functions. Specifically, if any subtrees (such as tokens) exist on a stack before a split and appear on the RHS of reductions during the split, those subtrees will be shared by both stacks and you will have to be careful not to doubly free them when the stacks are merged.

Would I be correct to say this subtree sharing would not occur if
the grammar was left recursive?

My usual solution is node reference counting. Until this bug is fixed, it looks like you will need that too. I suspect however that you'll eventually need reference counting regardless of this bug. You may even need it now for tokens.

My quick fix was to not free up any parse trees inside the merge
functions.

--
Derek M. Jones                              tel: +44 (0) 1252 520 667
Knowledge Software Ltd                      mailto:address@hidden
Applications Standards Conformance Testing    http://www.knosof.co.uk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]