[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Multiple, different, %merge of a token sequence
From: |
Derek M Jones |
Subject: |
Re: Multiple, different, %merge of a token sequence |
Date: |
Wed, 03 May 2006 01:19:04 +0100 |
User-agent: |
Thunderbird 1.5 (Windows/20051201) |
Joel,
Thanks for the prompt reply.
I've now given your code a much closer look. This behavior is another
manifestation of a complex Bison GLR problem that I've known about for a
while and am still working on. Unfortunately, I don't believe I've
previously posted about it. It touches on many areas of the GLR
implementation, and I just haven't found the time yet to properly describe
what I know.
Ok. It would be useful to add a warning to the documentation that
this kind of thing sometimes happens (at least for the moment).
Unfortunately, it seems like the most obvious solution will worsen the
parser's time complexity. I'm exploring a more sophisticated solution,
but I'm afraid it'll be a while before I return to it.
I can parse + other not very complicated stuff all the .c files in
the gcc/postgresql/openafs/openmotif/netscape (original C version)
source trees in under 30 cpu seconds. Having this increased to 60
seconds would not be a problem.
If your %merge functions eliminate subtrees, it appears that you are
right: this bug actually does increase subtree sharing for your example
input because it delays elimination until higher up.
For me this is not really a problem because it appears to be rare and I
can ignore the leaked memory caused by not doing the free's. A bigger
potential problem is that the high level merge has to do some analysis
to handle the fact that the expected disambiguation did not happen in
a lower level production. Since my project involves measuring C
source characteristics a small amount of 'random' error can be handled
(ie, I don't worry about doing a correct disambiguation). See
www.knosof.co.uk/cbook/cbook.html
However, node
sharing is a general problem in GLR parsers that can't always be solved by
writing such %merge functions. Specifically, if any subtrees (such as
tokens) exist on a stack before a split and appear on the RHS of
reductions during the split, those subtrees will be shared by both stacks
and you will have to be careful not to doubly free them when the stacks
are merged.
Would I be correct to say this subtree sharing would not occur if
the grammar was left recursive?
My usual solution is node reference counting. Until this bug is fixed, it
looks like you will need that too. I suspect however that you'll
eventually need reference counting regardless of this bug. You may even
need it now for tokens.
My quick fix was to not free up any parse trees inside the merge
functions.
--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:address@hidden
Applications Standards Conformance Testing http://www.knosof.co.uk