bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: Extracting the action transformation from the scanner


From: Akim Demaille
Subject: RFC: Extracting the action transformation from the scanner
Date: Thu, 25 Aug 2005 11:27:13 +0200
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

I'm a bit ambivalent on the following patch.

The starting point is that it might be good to move to using GLR for
our parser.  For instance, it will allow to get rid of the id_colon
trick.  But GLR, since it desynchronizes the parser from the scanner,
forbids so-called lexical tie-ins, in other words, the parser and the
scanner can no longer try to shared some evolving variables.

Currently the translation of special symbols such as $$, @2 etc. is
done by the scanner.  To do this the scanner needs to know the length
of the rule.  It knows it thanks to the id_colon trick.  Therefore if
we want to get rid of it, the scanner can no longer perform this
transformation.

I must confess that I was quite pleased with this, since I'm not fond
of modules that do many things at the same time; here the scanner
scans, protects characters from M4, translates user code, computes
rule lengthes etc.  Since I would like to enable new means to
designate semantic values (using names instead of $1 etc.), moving that
part elsewhere seems to provide some fresh air.  I also cherish the
possibility to see the values flowing in a "compiler", and allowing a
means to see the action after scanning but before conversion seemed
nice to me.  It would certainly also help moving from M4 since the
reader is no longer concerned with it.

So the following patch does just this: extract the action processing
parts from the scanner.  But to where?  My first desire was to have
another pass, even after parsing, because I'm fond of simple
multi-passes compilers where the scanner and the parser are just
concerned with reading, transformations belong to latter passes.  But
I finally chose to do that in the parser, since it required less
changes (for a start?).

I think that the grammar scanner is really simpler this way.  BTW, it
is no longer the case that a rule_length and a rule->length can be
different for midrules actions, since... midrule actions no longer
exist when the transformation is applied, they are already converted
into plain anonymous rules.  That's the kind of benefits that explain
why I like multipass compilers with progressive simplification of the
data to compile.

That was for the nice aspects of this patch.  Now there are a couple
of issues.

- It does introduce some code duplication.  There is code that can be
  factored and is not currently (e.g., obstacks in both scan-code and
  scan-gram), and some other pieces that are harder to share,
  typically some Flex patterns (splice, tag for instance).  There is
  not much common code though.

- The memory management is much worse: the first obstack keeps all the
  actions verbatim, and another keeps them transformed.  I have not
  fought to get rid of the first one, it is certainly possible,
  nevertheless at some point the two must coexist.  OTOH, bison is
  certainly a memory glouton afterwards when computing the automaton,
  so it's not certain that this duplication augments the peak of
  memory consumption.

- I'm no longer sure moving to GLR is a nice thing...

GLR is definitely nice when you have to face a dirty grammar.  But we
are in the position of defining this grammar, and it would probably
not be sensible not to keep it LALR.

Still, independently of this GLR or not issue, separating concerns
seems a nice property to me, propably easing the implementation of new
features.

What do you think?

Some technical comments about the patch:
- since I don't know what it's future is, I have not written the
  ChangeLog, but the comments above should help understanding it.
- the handling of locations in scan-code is no done yet.  There is no
  problem, it's just that I decided to handle this last.  The 2 failures
  of the test suite are only due to this.

Index: src/Makefile.am
===================================================================
RCS file: /cvsroot/bison/bison/src/Makefile.am,v
retrieving revision 1.66
diff -u -r1.66 Makefile.am
--- src/Makefile.am 25 Jul 2005 03:12:53 -0000 1.66
+++ src/Makefile.am 25 Aug 2005 08:53:44 -0000
@@ -53,6 +53,7 @@
        reader.c reader.h                         \
        reduce.c reduce.h                         \
        relation.c relation.h                     \
+       scan-code.h scan-code.l                   \
        scan-gram.l                               \
        scan-skel.h scan-skel.l                   \
        state.c state.h                           \
Index: src/output.c
===================================================================
RCS file: /cvsroot/bison/bison/src/output.c,v
retrieving revision 1.235
diff -u -r1.235 output.c
--- src/output.c 24 Jul 2005 07:24:22 -0000 1.235
+++ src/output.c 25 Aug 2005 08:53:44 -0000
@@ -204,7 +204,7 @@
 | Prepare the muscles related to the rules: rhs, prhs, r1, r2, |
 | rline, dprec, merger.                                        |
 `-------------------------------------------------------------*/
-
+/* FIXME: */ int max_left_semantic_context;
 static void
 prepare_rules (void)
 {
Index: src/parse-gram.y
===================================================================
RCS file: /cvsroot/bison/bison/src/parse-gram.y,v
retrieving revision 1.57
diff -u -r1.57 parse-gram.y
--- src/parse-gram.y 25 Jul 2005 03:38:41 -0000 1.57
+++ src/parse-gram.y 25 Aug 2005 08:53:45 -0000
@@ -32,6 +32,7 @@
 #include "quotearg.h"
 #include "reader.h"
 #include "symlist.h"
+#include "scan-code.h"
 
 #define YYLLOC_DEFAULT(Current, Rhs, N)  (Current) = lloc_default (Rhs, N)
 static YYLTYPE lloc_default (YYLTYPE const *, int);
@@ -200,7 +201,8 @@
 
 declaration:
   grammar_declaration
-| PROLOGUE                                 { prologue_augment ($1, @1); }
+| PROLOGUE                         { prologue_augment (translate_code ($1, @1),
+                                                      @1); }
 | "%debug"                                 { debug_flag = true; }
 | "%define" string_content string_content  { muscle_insert ($2, $3); }
 | "%defines"                               { defines_flag = true; }
@@ -215,7 +217,7 @@
   }
 | "%initial-action {...}"
   {
-    muscle_code_grow ("initial_action", $1, @1);
+    muscle_code_grow ("initial_action", translate_symbol_action ($1, @1), @1);
   }
 | "%lex-param {...}"                      { add_param ("lex_param", $1, @1); }
 | "%locations"                             { locations_flag = true; }
@@ -248,15 +250,17 @@
 | "%destructor {...}" symbols.1
     {
       symbol_list *list;
+      const char *action = translate_symbol_action ($1, @1);
       for (list = $2; list; list = list->next)
-       symbol_destructor_set (list->sym, $1, @1);
+       symbol_destructor_set (list->sym, action, @1);
       symbol_list_free ($2);
     }
 | "%printer {...}" symbols.1
     {
       symbol_list *list;
+      const char *action = translate_symbol_action ($1, @1);
       for (list = $2; list; list = list->next)
-       symbol_printer_set (list->sym, $1, list->location);
+       symbol_printer_set (list->sym, action, list->location);
       symbol_list_free ($2);
     }
 | "%default-prec"
@@ -316,7 +320,6 @@
 ;
 
 /* One or more nonterminals to be %typed. */
-
 symbols.1:
   symbol            { $$ = symbol_list_new ($1, @1); }
 | symbols.1 symbol  { $$ = symbol_list_prepend ($1, $2, @2); }
@@ -440,7 +443,7 @@
   /* Nothing.  */
 | "%%" EPILOGUE
     {
-      muscle_code_grow ("epilogue", $2, @2);
+      muscle_code_grow ("epilogue", translate_code ($2, @2), @2);
       scanner_last_string_free ();
     }
 ;
Index: src/reader.c
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.c,v
retrieving revision 1.238
diff -u -r1.238 reader.c
--- src/reader.c 24 Jul 2005 07:24:22 -0000 1.238
+++ src/reader.c 25 Aug 2005 08:53:45 -0000
@@ -21,6 +21,7 @@
    Boston, MA 02110-1301, USA.  */
 
 #include "system.h"
+#include <assert.h>
 
 #include <quotearg.h>
 
@@ -34,6 +35,7 @@
 #include "reader.h"
 #include "symlist.h"
 #include "symtab.h"
+#include "scan-code.h"
 
 static symbol_list *grammar = NULL;
 static bool start_flag = false;
@@ -75,6 +77,8 @@
     !typed ? &pre_prologue_obstack : &post_prologue_obstack;
 
   obstack_fgrow1 (oout, "]b4_syncline(%d, [[", loc.start.line);
+  /* FIXME: Protection of M4 characters missing here.  See
+     output.c:escaped_output.  */
   MUSCLE_OBSTACK_SGROW (oout,
                        quotearg_style (c_quoting_style, loc.start.file));
   obstack_sgrow (oout, "]])[\n");
@@ -370,7 +374,7 @@
 {
   if (current_rule->action)
     grammar_midrule_action ();
-  current_rule->action = action;
+  current_rule->action = translate_rule_action (current_rule, action, loc);
   current_rule->action_location = loc;
 }
 
@@ -413,7 +417,7 @@
             But the former needs to contain more: negative rule numbers. */
          ritem[itemno++] = symbol_number_as_item_number (p->sym->number);
          /* A rule gets by default the precedence and associativity
-            of the last token in it.  */
+            of its last token.  */
          if (p->sym->class == token_sym && default_prec)
            rules[ruleno].prec = p->sym;
          if (p)
@@ -435,7 +439,10 @@
     }
 
   if (itemno != nritems)
-    abort ();
+    {
+      fprintf (stderr, "itemno = %d, nritems = %d\n");
+      assert (itemno == nritems);
+    }
 
   if (trace_flag & trace_sets)
     ritem_print (stderr);
Index: src/reader.h
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.h,v
retrieving revision 1.45
diff -u -r1.45 reader.h
--- src/reader.h 25 Jul 2005 03:12:53 -0000 1.45
+++ src/reader.h 25 Aug 2005 08:53:45 -0000
@@ -48,8 +48,8 @@
 extern FILE *gram_out;
 extern int gram_lineno;
 
-# define YY_DECL int gram_lex (YYSTYPE *val, location *loc)
-YY_DECL;
+# define GRAM_LEX_DECL int gram_lex (YYSTYPE *val, location *loc)
+GRAM_LEX_DECL;
 
 
 /* From the parser.  */
Index: src/scan-code.h
===================================================================
RCS file: src/scan-code.h
diff -N src/scan-code.h
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-code.h 25 Aug 2005 08:53:45 -0000
@@ -0,0 +1,42 @@
+/* Bison Action Scanner
+
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+#ifndef SCAN_CODE_H_
+# define SCAN_CODE_H_
+
+# include "location.h"
+# include "symlist.h"
+
+void code_scanner_free (void);
+
+/* The action A contains $$, $1 etc. referring to the values
+   of the rule R. */
+const char *translate_rule_action (symbol_list *r, const char *a, location l);
+
+/* The action A refers to $$ and @$ only, referring to a symbol. */
+const char *translate_symbol_action (const char *a, location l);
+
+/* The action contains no special escapes, just protect M4 special
+   symbols.  */
+const char *translate_code (const char *a, location l);
+
+#endif /* !SCAN_CODE_H_ */
Index: src/scan-code.l
===================================================================
RCS file: src/scan-code.l
diff -N src/scan-code.l
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-code.l 25 Aug 2005 08:53:45 -0000
@@ -0,0 +1,359 @@
+/* Bison Action Scanner                             -*- C -*-
+
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+%option debug nodefault nounput noyywrap never-interactive
+%option prefix="code_" outfile="lex.yy.c"
+
+%{
+#include "system.h"
+#include <assert.h>
+
+#include <get-errno.h>
+#include <quote.h>
+
+#include "scan-code.h"
+#include "getargs.h"  /* locations_flag */
+#include "complain.h"
+#include "gram.h"
+#include "reader.h"
+
+/* The current calling start condition: SC_RULE_ACTION or
+   SC_SYMBOL_ACTION. */
+# define YY_DECL const char *code_lex (int sc_context)
+YY_DECL;
+
+static void handle_action_dollar (char *cp, location loc);
+static void handle_action_at (char *cp, location loc);
+static location the_location;
+static location *loc = &the_location;
+
+/* OBSTACK_FOR_STRING -- Used to store all the characters that we need to
+   keep (to construct ID, STRINGS etc.).  Use the following macros to
+   use it.
+
+   Use STRING_GROW to append what has just been matched, and
+   STRING_FINISH to end the string (it puts the ending 0).
+   STRING_FINISH also stores this string in LAST_STRING, which can be
+   used, and which is used by STRING_FREE to free the last string.  */
+
+static struct obstack obstack_for_string;
+
+/* A string representing the most recently saved token.  */
+static char *last_string;
+
+#define STRING_GROW   \
+  obstack_grow (&obstack_for_string, yytext, yyleng)
+
+#define STRING_FINISH                                  \
+  do {                                                 \
+    obstack_1grow (&obstack_for_string, '\0');         \
+    last_string = obstack_finish (&obstack_for_string);        \
+  } while (0)
+
+/* The rule being processed. */
+symbol_list *current_rule;
+%}
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
+ /* Whether in a rule or symbol action.  Specifies the translation
+    of $ and @.  */
+%x SC_RULE_ACTION SC_SYMBOL_ACTION
+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+   historically almost any character is allowed in a tag.  We disallow
+   NUL and newline, as this simplifies our implementation.  */
+tag     [^\0\n>]+
+
+/* Zero or more instances of backslash-newline.  Following GCC, allow
+   white space between the backslash and the newline.  */
+splice  (\\[ \f\t\v]*\n)*
+
+%%
+
+%{
+  /* This scanner is special: it is invoked only once, henceforth
+     is expected to return only once.  This initialization is
+     therefore done once per action to translate. */
+  assert (sc_context == SC_SYMBOL_ACTION
+         || sc_context == SC_RULE_ACTION
+         || sc_context == INITIAL);
+  BEGIN sc_context;
+%}
+
+  /*------------------------------------------------------------.
+  | Scanning a C comment.  The initial `/ *' is already eaten.  |
+  `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+  "*"{splice}"/"  STRING_GROW; BEGIN sc_context;
+}
+
+
+  /*--------------------------------------------------------------.
+  | Scanning a line comment.  The initial `//' is already eaten.  |
+  `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+  "\n"          STRING_GROW; BEGIN sc_context;
+  {splice}      STRING_GROW;
+}
+
+
+  /*--------------------------------------------.
+  | Scanning user-code characters and strings.  |
+  `--------------------------------------------*/
+
+<SC_CHARACTER,SC_STRING>
+{
+  {splice}|\\{splice}. STRING_GROW;
+}
+
+<SC_CHARACTER>
+{
+  "'"          STRING_GROW; BEGIN sc_context;
+}
+
+<SC_STRING>
+{
+  "\""         STRING_GROW; BEGIN sc_context;
+}
+
+
+<SC_RULE_ACTION,SC_SYMBOL_ACTION>{
+  "'" {
+    STRING_GROW;
+    BEGIN SC_CHARACTER;
+  }
+  "\"" {
+    STRING_GROW;
+    BEGIN SC_STRING;
+  }
+  "/"{splice}"*" {
+    STRING_GROW;
+    BEGIN SC_COMMENT;
+  }
+  "/"{splice}"/" {
+    STRING_GROW;
+    BEGIN SC_LINE_COMMENT;
+  }
+}
+
+<SC_RULE_ACTION>
+{
+  "$"("<"{tag}">")?(-?[0-9]+|"$")   handle_action_dollar (yytext, *loc);
+  "@"(-?[0-9]+|"$")                handle_action_at (yytext, *loc);
+}
+
+<SC_SYMBOL_ACTION>
+{
+  "$$"   obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
+  "@$"   obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
+}
+
+
+  /*-----------------------------------------.
+  | Escape M4 quoting characters in C code.  |
+  `-----------------------------------------*/
+
+<*>
+{
+  \$   obstack_sgrow (&obstack_for_string, "$][");
+  \@   obstack_sgrow (&obstack_for_string, "@@");
+  \[   obstack_sgrow (&obstack_for_string, "@{");
+  \]   obstack_sgrow (&obstack_for_string, "@}");
+}
+
+  /*-----------------------------------------------------.
+  | By default, grow the string obstack with the input.  |
+  `-----------------------------------------------------*/
+
+<*>.|\n        STRING_GROW;
+
+ /* End of processing. */
+<*><<EOF>>        STRING_FINISH; return last_string;
+
+%%
+
+/* Keeps track of the maximum number of semantic values to the left of
+   a handle (those referenced by $0, $-1, etc.) are required by the
+   semantic actions of this grammar. */
+int max_left_semantic_context = 0;
+
+
+/*------------------------------------------------------------------.
+| TEXT is pointing to a wannabee semantic value (i.e., a `$').      |
+|                                                                   |
+| Possible inputs: $[<TYPENAME>]($|integer)                         |
+|                                                                   |
+| Output to OBSTACK_FOR_STRING a reference to this semantic value.  |
+`------------------------------------------------------------------*/
+
+static void
+handle_action_dollar (char *text, location loc)
+{
+  const char *type_name = NULL;
+  char *cp = text + 1;
+  int rule_length = symbol_list_length (current_rule) - 1;
+
+  /* Get the type name if explicit. */
+  if (*cp == '<')
+    {
+      type_name = ++cp;
+      while (*cp != '>')
+       ++cp;
+      *cp = '\0';
+      ++cp;
+    }
+
+  if (*cp == '$')
+    {
+      if (!type_name)
+       type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
+      if (!type_name && typed)
+       complain_at (loc, _("$$ of `%s' has no declared type"),
+                    current_rule->sym->tag);
+      if (!type_name)
+       type_name = "";
+      obstack_fgrow1 (&obstack_for_string,
+                     "]b4_lhs_value([%s])[", type_name);
+    }
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         if (1-n > max_left_semantic_context)
+           max_left_semantic_context = 1-n;
+         if (!type_name && n > 0)
+           type_name = symbol_list_n_type_name_get (current_rule, loc, n);
+         if (!type_name && typed)
+           complain_at (loc, _("$%d of `%s' has no declared type"),
+                        n, current_rule->sym->tag);
+         if (!type_name)
+           type_name = "";
+         obstack_fgrow3 (&obstack_for_string,
+                         "]b4_rhs_value(%d, %d, [%s])[",
+                         rule_length, n, type_name);
+       }
+      else
+       complain_at (loc, _("XXX integer out of range: %s"), quote (text));
+    }
+}
+
+
+/*------------------------------------------------------.
+| TEXT is a location token (i.e., a address@hidden').  Output to |
+| OBSTACK_FOR_STRING a reference to this location.      |
+`------------------------------------------------------*/
+
+static void
+handle_action_at (char *text, location loc)
+{
+  char *cp = text + 1;
+  int rule_length = symbol_list_length (current_rule) - 1;
+  locations_flag = true;
+
+  if (*cp == '$')
+    obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
+                         rule_length, n);
+       }
+      else
+       complain_at (loc, _("integer out of range: %s"), quote (text));
+    }
+}
+
+
+/*-------------------------.
+| Initialize the scanner.  |
+`-------------------------*/
+
+const char *
+translate_action (int sc_context, const char *a)
+{
+  const char *res;
+  static bool initialized = false;
+  if (!initialized)
+    {
+      obstack_init (&obstack_for_string);
+      /* The initial buffer, never used. */
+      yy_delete_buffer (YY_CURRENT_BUFFER);
+      yy_flex_debug = 0;
+  the_location.start.file   = the_location.end.file   = "FPP";
+  the_location.start.line   = the_location.end.line   = 1;
+  the_location.start.column = the_location.end.column = 0;
+      initialized = true;
+    }
+
+  yy_switch_to_buffer (yy_scan_string (a));
+  res = code_lex (sc_context);
+  yy_delete_buffer (YY_CURRENT_BUFFER);
+  return res;
+}
+
+const char *
+translate_rule_action (symbol_list *r, const char *a, location l)
+{
+  current_rule = r;
+  return translate_action (SC_RULE_ACTION, a);
+}
+
+const char *
+translate_symbol_action (const char *a, location l)
+{
+  return translate_action (SC_SYMBOL_ACTION, a);
+}
+
+const char *
+translate_code (const char *a, location l)
+{
+  return translate_action (INITIAL, a);
+}
+
+/*-----------------------------------------------.
+| Free all the memory allocated to the scanner.  |
+`-----------------------------------------------*/
+
+void
+code_scanner_free (void)
+{
+  obstack_free (&obstack_for_string, 0);
+  /* Reclaim Flex's buffers.  */
+  yy_delete_buffer (YY_CURRENT_BUFFER);
+}
Index: src/scan-gram.l
===================================================================
RCS file: /cvsroot/bison/bison/src/scan-gram.l,v
retrieving revision 1.75
diff -u -r1.75 scan-gram.l
--- src/scan-gram.l 22 Jul 2005 17:58:51 -0000 1.75
+++ src/scan-gram.l 25 Aug 2005 08:53:45 -0000
@@ -49,6 +49,7 @@
   while (0)
 
 /* Pacify "gcc -Wmissing-prototypes" when flex 2.5.31 is used.  */
+#define YY_DECL GRAM_LEX_DECL
 int gram_get_lineno (void);
 FILE *gram_get_in (void);
 FILE *gram_get_out (void);
@@ -104,16 +105,6 @@
   STRING_FREE;
 }
 
-/* Within well-formed rules, RULE_LENGTH is the number of values in
-   the current rule so far, which says where to find `$0' with respect
-   to the top of the stack.  It is not the same as the rule->length in
-   the case of mid rule actions.
-
-   Outside of well-formed rules, RULE_LENGTH has an undefined value.  */
-static int rule_length;
-
-static void handle_dollar (int token_type, char *cp, location loc);
-static void handle_at (int token_type, char *cp, location loc);
 static void handle_syncline (char *args);
 static unsigned long int scan_integer (char const *p, int base, location loc);
 static int convert_ucn_to_byte (char const *hex_text);
@@ -121,11 +112,26 @@
 static void unexpected_newline (boundary, char const *);
 
 %}
-%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
-%x SC_STRING SC_CHARACTER
-%x SC_AFTER_IDENTIFIER
+ /* A C-like comment in directives/rules. */
+%x SC_YACC_COMMENT
+ /* Strings and characters in directives/rules. */
 %x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
-%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
+ /* A identifier was just read in directives/rules.  Special state
+    to capture the sequence `identifier :'. */
+%x SC_AFTER_IDENTIFIER
+ /* A keyword that should be followed by some code was read (e.g.
+    %printer). */
+%x SC_PRE_CODE
+
+ /* Three types of user code:
+    - prologue (code between `%{' `%}' in the first section, before %%);
+    - actions, printers, union, etc, (between braced in the middle section);
+    - epilogue (everything after the second %%). */
+%x SC_PROLOGUE SC_BRACED_CODE SC_EPILOGUE
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
 
 letter   [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
 id       {letter}({letter}|[0-9])*
@@ -221,7 +227,7 @@
   "%nterm"                return PERCENT_NTERM;
   "%output"               return PERCENT_OUTPUT;
   "%parse-param"         token_type = PERCENT_PARSE_PARAM; BEGIN SC_PRE_CODE;
-  "%prec"                 rule_length--; return PERCENT_PREC;
+  "%prec"                 return PERCENT_PREC;
   "%printer"              token_type = PERCENT_PRINTER; BEGIN SC_PRE_CODE;
   "%pure"[-_]"parser"     return PERCENT_PURE_PARSER;
   "%right"                return PERCENT_RIGHT;
@@ -240,13 +246,12 @@
   }
 
   "="                     return EQUAL;
-  "|"                     rule_length = 0; return PIPE;
+  "|"                     return PIPE;
   ";"                     return SEMICOLON;
 
   {id} {
     val->symbol = symbol_get (yytext, *loc);
     id_loc = *loc;
-    rule_length++;
     BEGIN SC_AFTER_IDENTIFIER;
   }
 
@@ -311,7 +316,6 @@
 <SC_AFTER_IDENTIFIER>
 {
   ":" {
-    rule_length = 0;
     *loc = id_loc;
     BEGIN INITIAL;
     return ID_COLON;
@@ -377,7 +381,6 @@
     STRING_FINISH;
     loc->start = token_start;
     val->chars = last_string;
-    rule_length++;
     BEGIN INITIAL;
     return STRING;
   }
@@ -404,7 +407,6 @@
     last_string_1 = last_string[1];
     symbol_user_token_number_set (val->symbol, last_string_1, *loc);
     STRING_FREE;
-    rule_length++;
     BEGIN INITIAL;
     return ID;
   }
@@ -478,7 +480,7 @@
 
 <SC_CHARACTER,SC_STRING>
 {
-  {splice}|address@hidden      STRING_GROW;
+  {splice}|\\{splice}[^\n\[\]] STRING_GROW;
 }
 
 <SC_CHARACTER>
@@ -597,7 +599,6 @@
     if (outer_brace)
       {
        STRING_FINISH;
-       rule_length++;
        loc->start = code_start;
        val->chars = last_string;
        BEGIN INITIAL;
@@ -609,9 +610,6 @@
      (as `<' `<%').  */
   "<"{splice}"<"  STRING_GROW;
 
-  "$"("<"{tag}">")?(-?[0-9]+|"$")  handle_dollar (token_type, yytext, *loc);
-  "@"(-?[0-9]+|"$")               handle_at (token_type, yytext, *loc);
-
   <<EOF>>  unexpected_eof (code_start, "}"); BEGIN INITIAL;
 }
 
@@ -651,19 +649,6 @@
 }
 
 
-  /*-----------------------------------------.
-  | Escape M4 quoting characters in C code.  |
-  `-----------------------------------------*/
-
-<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
-{
-  \$   obstack_sgrow (&obstack_for_string, "$][");
-  \@   obstack_sgrow (&obstack_for_string, "@@");
-  \[   obstack_sgrow (&obstack_for_string, "@{");
-  \]   obstack_sgrow (&obstack_for_string, "@}");
-}
-
-
   /*-----------------------------------------------------.
   | By default, grow the string obstack with the input.  |
   `-----------------------------------------------------*/
@@ -673,11 +658,6 @@
 
 %%
 
-/* Keeps track of the maximum number of semantic values to the left of
-   a handle (those referenced by $0, $-1, etc.) are required by the
-   semantic actions of this grammar. */
-int max_left_semantic_context = 0;
-
 /* Set *LOC and adjust scanner cursor to account for token TOKEN of
    size SIZE.  */
 
@@ -759,176 +739,6 @@
     }
 
   return bytes_read;
-}
-
-
-/*------------------------------------------------------------------.
-| TEXT is pointing to a wannabee semantic value (i.e., a `$').      |
-|                                                                   |
-| Possible inputs: $[<TYPENAME>]($|integer)                         |
-|                                                                   |
-| Output to OBSTACK_FOR_STRING a reference to this semantic value.  |
-`------------------------------------------------------------------*/
-
-static inline bool
-handle_action_dollar (char *text, location loc)
-{
-  const char *type_name = NULL;
-  char *cp = text + 1;
-
-  if (! current_rule)
-    return false;
-
-  /* Get the type name if explicit. */
-  if (*cp == '<')
-    {
-      type_name = ++cp;
-      while (*cp != '>')
-       ++cp;
-      *cp = '\0';
-      ++cp;
-    }
-
-  if (*cp == '$')
-    {
-      if (!type_name)
-       type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
-      if (!type_name && typed)
-       complain_at (loc, _("$$ of `%s' has no declared type"),
-                    current_rule->sym->tag);
-      if (!type_name)
-       type_name = "";
-      obstack_fgrow1 (&obstack_for_string,
-                     "]b4_lhs_value([%s])[", type_name);
-    }
-  else
-    {
-      long int num;
-      set_errno (0);
-      num = strtol (cp, 0, 10);
-
-      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
-       {
-         int n = num;
-         if (1-n > max_left_semantic_context)
-           max_left_semantic_context = 1-n;
-         if (!type_name && n > 0)
-           type_name = symbol_list_n_type_name_get (current_rule, loc, n);
-         if (!type_name && typed)
-           complain_at (loc, _("$%d of `%s' has no declared type"),
-                        n, current_rule->sym->tag);
-         if (!type_name)
-           type_name = "";
-         obstack_fgrow3 (&obstack_for_string,
-                         "]b4_rhs_value(%d, %d, [%s])[",
-                         rule_length, n, type_name);
-       }
-      else
-       complain_at (loc, _("integer out of range: %s"), quote (text));
-    }
-
-  return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?).                                         |
-`----------------------------------------------------------------*/
-
-static void
-handle_dollar (int token_type, char *text, location loc)
-{
-  switch (token_type)
-    {
-    case BRACED_CODE:
-      if (handle_action_dollar (text, loc))
-       return;
-      break;
-
-    case PERCENT_DESTRUCTOR:
-    case PERCENT_INITIAL_ACTION:
-    case PERCENT_PRINTER:
-      if (text[1] == '$')
-       {
-         obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
-         return;
-       }
-      break;
-
-    default:
-      break;
-    }
-
-  complain_at (loc, _("invalid value: %s"), quote (text));
-}
-
-
-/*------------------------------------------------------.
-| TEXT is a location token (i.e., a address@hidden').  Output to |
-| OBSTACK_FOR_STRING a reference to this location.      |
-`------------------------------------------------------*/
-
-static inline bool
-handle_action_at (char *text, location loc)
-{
-  char *cp = text + 1;
-  locations_flag = true;
-
-  if (! current_rule)
-    return false;
-
-  if (*cp == '$')
-    obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
-  else
-    {
-      long int num;
-      set_errno (0);
-      num = strtol (cp, 0, 10);
-
-      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
-       {
-         int n = num;
-         obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
-                         rule_length, n);
-       }
-      else
-       complain_at (loc, _("integer out of range: %s"), quote (text));
-    }
-
-  return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map address@hidden' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?).                                         |
-`----------------------------------------------------------------*/
-
-static void
-handle_at (int token_type, char *text, location loc)
-{
-  switch (token_type)
-    {
-    case BRACED_CODE:
-      handle_action_at (text, loc);
-      return;
-
-    case PERCENT_INITIAL_ACTION:
-    case PERCENT_DESTRUCTOR:
-    case PERCENT_PRINTER:
-      if (text[1] == '$')
-       {
-         obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
-         return;
-       }
-      break;
-
-    default:
-      break;
-    }
-
-  complain_at (loc, _("invalid value: %s"), quote (text));
 }
 
 
Index: src/symtab.c
===================================================================
RCS file: /cvsroot/bison/bison/src/symtab.c,v
retrieving revision 1.65
diff -u -r1.65 symtab.c
--- src/symtab.c 20 Jul 2005 21:17:04 -0000 1.65
+++ src/symtab.c 25 Aug 2005 08:53:45 -0000
@@ -130,7 +130,7 @@
 `------------------------------------------------------------------*/
 
 void
-symbol_destructor_set (symbol *sym, char *destructor, location loc)
+symbol_destructor_set (symbol *sym, const char *destructor, location loc)
 {
   if (destructor)
     {
@@ -147,7 +147,7 @@
 `---------------------------------------------------------------*/
 
 void
-symbol_printer_set (symbol *sym, char *printer, location loc)
+symbol_printer_set (symbol *sym, const char *printer, location loc)
 {
   if (printer)
     {
Index: src/symtab.h
===================================================================
RCS file: /cvsroot/bison/bison/src/symtab.h,v
retrieving revision 1.57
diff -u -r1.57 symtab.h
--- src/symtab.h 12 Jul 2005 15:58:49 -0000 1.57
+++ src/symtab.h 25 Aug 2005 08:53:45 -0000
@@ -60,10 +60,12 @@
   uniqstr type_name;
   location type_location;
 
-  char *destructor;
+  /* Does not own the memory. */
+  const char *destructor;
   location destructor_location;
 
-  char *printer;
+  /* Does not own the memory. */
+  const char *printer;
   location printer_location;
 
   symbol_number number;
@@ -109,10 +111,10 @@
 void symbol_type_set (symbol *sym, uniqstr type_name, location loc);
 
 /* Set the DESTRUCTOR associated with SYM.  */
-void symbol_destructor_set (symbol *sym, char *destructor, location loc);
+void symbol_destructor_set (symbol *sym, const char *destructor, location loc);
 
 /* Set the PRINTER associated with SYM.  */
-void symbol_printer_set (symbol *sym, char *printer, location loc);
+void symbol_printer_set (symbol *sym, const char *printer, location loc);
 
 /* Set the PRECEDENCE associated with SYM.  Ensure that SYMBOL is a
    terminal.  Do nothing if invoked with UNDEF_ASSOC as ASSOC.  */
Index: src/system.h
===================================================================
RCS file: /cvsroot/bison/bison/src/system.h,v
retrieving revision 1.68
diff -u -r1.68 system.h
--- src/system.h 24 Jul 2005 07:24:22 -0000 1.68
+++ src/system.h 25 Aug 2005 08:53:45 -0000
@@ -99,6 +99,8 @@
 # define ATTRIBUTE_UNUSED __attribute__ ((__unused__))
 #endif
 
+#define FUNCTION_PRINT() fprintf (stderr, "%s: ", __func__)
+
 /*------.
 | NLS.  |
 `------*/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]