[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: symbol type issue with unused non-terminals
From: |
Scheidler , Balázs |
Subject: |
Re: symbol type issue with unused non-terminals |
Date: |
Fri, 1 Feb 2019 13:27:36 +0100 |
On Fri, Feb 1, 2019 at 6:56 AM Akim Demaille <address@hidden> wrote:
> Hi Balázs!
>
> > Le 31 janv. 2019 à 19:15, Scheidler, Balázs <
> address@hidden> a écrit :
> >
> > Hi,
> >
> > We, in the syslog-ng project (https://github.com/balabit/syslog-ng)
> have a
> > bison grammar file that contains a number of unused non-terminals. The
> > reasons for this is complicated, which I could explain if needed.
>
> Yes, out of curiosity I'd be happy to know why it makes sense for you
> (there's no plan to refuse grammars with useless symbols!).
>
The reason is that we use bison to parse portions of our configuration
file, but the configuration language is extended with plugins that are
loaded at runtime.
The solution is:
- we have a main grammar that supports the basic configuration language
and a way to trigger on-demand loading of plugins
- the plugin also has a bison grammar that "includes" rules from the
main grammar file
For this reason:
- there are rules in the main config that are only used by plugins,
which cause unused rules when compiling the main grammar
- when we include these rules into the plugin grammar file, not all of
our included rules will be used by a specific plugin.
This means that both the main grammar and the plugin grammars will have
some unused rules.
We use a homegrown python script that grabs the reusable rules and adds
them to the plugin grammar during compilation.
I am happy to elaborate, if more information is needed.
> > [...]
> > Based on my debugging I've found this root cause:
> >
> > - rules are parsed as part of the grammar, and get an associated symbol
> > number
> > - the RHS of rules reference terminal and non-terminal symbols using a
> > symbol number. These are resolved at grammar read time and the symbol
> > number is generated into the output eventually making it to m4.
> > - at this point reduce_grammar() happens, this removes the unused
> > non-terminal rules, causing symbols to be renumbered.
> > - this makes an effort to update all symbol number references, however
> > RHS of rules is not updated.
> > - RHS of rules that reference "old" numbers that are higher than the
> > maximum, cause those ugly m4 errors that you see above
> > - At the same time, in such a case an RHS expression can easily
> > reference the wrong symbol, if they got renumbered. A different
> > manifestation of the same bug, where dollar actions (e.g. $1, $2, etc)
> > start to use an invalid <tag> to reference the value in YYSTYPE.
>
> Thank you for the careful analysis! Yes, you pinpointed the issue.
>
> For the record, something that is very useful to debug such issues is the
> --trace option. In the present case, --trace=muscle would reveal the
> generated symbol numbers. Comparing 3.2 and 3.3 is instructive.
>
I used --trace=muscles while trying to understand what bison does. I was
also reading its code, which I've found pretty easy to read btw.
>
> > This was triggered in our code-base, because macOs brew updated to bison
> > 3.3.1 recently. If at all possible it would be great if this problem
> would
> > not spread too far (e.g. Debian). bison 3.2 still seems to work properly.
>
> I'll try to address this asap and release the fix immediately.
> Sorry about this issue.
>
> Our test suite is already quite big, but I regularly discover missing
> cases...
>
>
It's an uphill battle, but still a useful one. I find that tests
(especially if they are fast) give me a lot of self confidence when cutting
releases :)
Bazsi