Problems with use of headers for multiple grammars in one source (and C+

bug-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problems with use of headers for multiple grammars in one source (and C+

From:	Tim Van Holder
Subject:	Problems with use of headers for multiple grammars in one source (and C++ compilation issues)
Date:	Wed, 14 Aug 2013 17:09:49 +0200

Hi,


I'm in the process of upgrading our internal development server from
an old debian machine (frozen in time due to an inability to upgrade
to kernel 2.6) with bison 2.3 and gcc 4.3.2 to new virtualized
hardware running an up-to-date debian/testing, with bison 2.7.12-4996
and gcc 4.8.
We always run bison like:

  bison --defines --debug --name-prefix=foo

(adding in --report=all if bison is >= 1.875).

This results in two failing use cases.


The first seems like a common-ish situation. Our older language
modules use a C-based AST library; this stores the grammar token
as part of a node's identification.
Some tools use multiple grammars (typically, but not exclusively,
for embedded sub-languages); these grammars take care to ensure
that the numeric values for tokens do not clash.

However, something seems to have changed in how the header file for
the grammar is generated that breaks this. When #including two such
headers in one program, none of the tokens of the second grammar are
visible:


  #include "foo1-grammar.h" /* tokens called t_XXX */
  ...
  #include "foo2-grammar.h" /* tokens called c_XXX */
  ...
  if (node->token == c_FOO) do_something(node);

error: c_FOO undeclared (first use this function)

The difference seems to be that bison 2.3 produced a set of #defines
for the token names after the YYTOKENTYPE enum; bison 2.7.12 does not.

In addition, the enum is called yytokentype, not <prefix>tokentype,
so adding #undef YYTOKENTYPE before the second #include does not help.

I've now work around it by surrounding the headers like so:

  #define yytokentype foo1tokentype
  #include "foo1-grammar.h" /* tokens called t_XXX */
  #undef yytokentype
  #undef YYTOKENTYPE
  ...
  #define yytokentype foo2tokentype
  #include "foo2-grammar.h" /* tokens called c_XXX */
  #undef yytokentype
  #undef YYTOKENTYPE

This works, but is certainly not ideal (and would break again if bison
decided to use slightly different enum/macro names).

There is a similar, more subtle, problem with YYSTYPE - because it also
does not use the prefix, foo2lval in the case above would reuse the
type of foo1lval. Luckily we only use lval in the lexers (which only
include one header), so this has no direct impact for us - but it seems
like an issue nonetheless. In fact, it is worse, since the approach used
above for yytokentype is not possible - the type is not (re)defined
if YYSTYPE is already #defined.

In both cases, it seems like using the name prefix instead of YY would
have helped.

Side note: the header now also includes a prototype for yyparse().
For sources including the header solely to get the token names/values,
this adds a new, unexpected, dependency on the type(s) used for
%parse-param.

It would be really nice if a bit more thought was put into avoiding the
breaking of backwards compatibility like this.



The second case is a bit more exotic - several of our language modules
use a C++ AST library, so the grammars need to be compiled as C++.
In addition, they are placed in namespaces via the %{ %} blocks:

  %{

  // #includes go here

  namespace Initech {
  namespace Languages {
  namespace Foo {

  // prototypes/inline functions go here

  %}

  <grammar>

  %%

  // local functions go here

  Node*
  parse_foo(parser_context& context)
  {
    ...
    fooparse(context); // call the bison-generated parse function
    ...
  }

  } // namespace Foo
  } // namespace Languages
  } // namespace Initech

Which has always worked fine.

However, the %{ %} block gets expanded before a bunch of stuff that
contains (conditional) #includes for C library headers (not their C++
versions). This results in compile errors (tried GCC 4.6, 4.7 and 4.8):

error: cannot convert ‘Initech::Languages::Foo::_IO_FILE*’ to ‘FILE* {aka
 _IO_FILE*}’ for argument ‘1’ to ‘int Initech::Languages::Foo::fprintf
 (FILE*, const char*, ...)’

To avoid having to change all the grammar files, I now have a script that
adds

  #include <cstddef>
  #include <cstdio>
  #include <cstdlib>

before the "#line 1 ..." line in the output, and this fixes the compile.
But this is a workaround at best; some better solution would be great
(and "use bison's C++ grammar output" is NOT a solution :-)).
Would it help to put the #includes before the user %{ %} block?
Or to support multiple %{ %} with distinct defined insertion points, so
that the one with the namespaces can come after the #includes (of course,
this would mean that no externally-visible defintions (variables/functions)
can come before that expansion).

Note: I am not currently subscribed to this list, so please keep me in CC.

[Prev in Thread]

Current Thread

[Next in Thread]

Problems with use of headers for multiple grammars in one source (and C++ compilation issues), Tim Van Holder <=
- Re: Problems with use of headers for multiple grammars in one source (and C++ compilation issues), Valentin Tolmer, 2013/08/15
  - Re: Problems with use of headers for multiple grammars in one source (and C++ compilation issues), Tim Van Holder, 2013/08/16

Prev by Date: Index used before checking in Bison generated code.
Next by Date: Re: Problems with use of headers for multiple grammars in one source (and C++ compilation issues)
Previous by thread: Index used before checking in Bison generated code.
Next by thread: Re: Problems with use of headers for multiple grammars in one source (and C++ compilation issues)
Index(es):
- Date
- Thread