bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/1] add java push parsing


From: Dennis Heimbigner
Subject: [PATCH 1/1] add java push parsing
Date: Tue, 29 Jan 2013 14:00:35 -0700
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Added java push parsing support

* data/lalr1.java

  1. capture the declarations as m4 macros.  This was done to
  avoid duplication.  When push parsing, the declarations
  occur at the class instance level rather than within the
  parse() function.

  2. Initialization of the declarations occurs in a function
  called push_parse_initialize() that is called on the first
  invocation of push_parse().

  3. The body of the parse loop is modified to return values at
  appropriate points when doing push parsing.  In order to
  make push parsing work, it was necessary to divide
  YYNEWSTATE into two states: YYNEWSTATE and YYGETTOKEN. On
  the first call to push_parse, the state is YYNEWSTATE. On
  all later entries, the state is set to YYGETTOKEN. The
  YYNEWSTATE switch arm falls through into
  YYGETTOKEN. YYGETTOKEN indicates that a new token is
  potentially needed. Normally, with a pull parser, this new
  token would be obtained by calling yylex(). In the push
  parser, the value YYMORE is returned to the caller. On the
  next call to push_parse(), the parser will return to the
  YYGETTOKEN state and continue operation.

* tests/javapush.at: new test file for java push parsing

* tests/testsuite.at: add invocation of javapush.at

* tests/local.mk: add javapush.at to distribution
---
 NEWS               |    3 +
data/lalr1.java | 280 +++++++++++++++++++++++++++++++++++++++++++---------
 doc/bison.texi     |   73 ++++++++++++++
 tests/local.mk     |    1 +
 tests/testsuite.at |    1 +
 5 files changed, 309 insertions(+), 49 deletions(-)

diff --git a/NEWS b/NEWS
index 62f834b..405f3af 100644
--- a/NEWS
+++ b/NEWS
@@ -228,6 +228,9 @@ GNU Bison NEWS
   is possible to add code to the parser's constructors using "%code init"
   and "%define init_throws".

+  Contributed by Dennis Heimbigner
+  The java skeleton, data/lalr1.java, now supports push parsing.
+
 ** C++ skeletons improvements

 *** The parser header is no longer mandatory (lalr1.cc, glr.cc)
diff --git a/data/lalr1.java b/data/lalr1.java
index 187580a..a3801a8 100644
--- a/data/lalr1.java
+++ b/data/lalr1.java
@@ -15,6 +15,34 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.

+dnl Modified by Dennis Heimbigner (address@hidden)
+dnl to support push parsing.
+dnl
+dnl changes:
+dnl
+dnl 1. capture the declarations as m4 macros.
+dnl This was done to avoid duplication.
+dnl When push parsing, the declarations occur at
+dnl the class instance level rather than within the parse() function.
+dnl
+dnl 2. Initialization of the declarations occurs in a function
+dnl called push_parse_initialize() that is called on the first
+dnl invocation of push_parse().
+dnl
+dnl 3. The body of the parse loop is modified to return values at
+dnl appropriate points when doing push parsing.  In order to
+dnl make push parsing work, it was necessary to divide
+dnl YYNEWSTATE into two states: YYNEWSTATE and YYGETTOKEN. On
+dnl the first call to push_parse, the state is YYNEWSTATE. On
+dnl all later entries, the state is set to YYGETTOKEN. The
+dnl YYNEWSTATE switch arm falls through into
+dnl YYGETTOKEN. YYGETTOKEN indicates that a new token is
+dnl potentially needed. Normally, with a pull parser, this new
+dnl token would be obtained by calling yylex(). In the push
+dnl parser, the value YYMORE is returned to the caller. On the
+dnl next call to push_parse(), the parser will return to the
+dnl YYGETTOKEN state and continue operation.
+
 m4_include(b4_pkgdatadir/[java.m4])

 b4_defines_if([b4_fatal([%s: %%defines does not make sense in Java],
@@ -29,6 +57,47 @@ m4_define([b4_symbol_no_destructor_assert],
                         [b4_skeleton],
[b4_symbol_action_location([$1], [destructor])])])])
 b4_symbol_foreach([b4_symbol_no_destructor_assert])
+dnl
+dnl # Check the value of %define api.push-pull.
+b4_percent_define_default([[api.push-pull]], [[pull]])dnl
+b4_percent_define_check_values([[[[api.push-pull]],
+                                 [[pull]], [[push]], [[both]]]])dnl
+b4_define_flag_if([pull]) m4_define([b4_pull_flag], [[1]])dnl
+b4_define_flag_if([push]) m4_define([b4_push_flag], [[1]])dnl
+m4_case(b4_percent_define_get([[api.push-pull]]),
+        [pull], [m4_define([b4_push_flag], [[0]])],
+        [push], [m4_define([b4_pull_flag], [[0]])])dnl
+dnl
+dnl # Handle BISON_USE_PUSH_FOR_PULL for the test suite. So that push parsing +dnl # tests function as written, do not let BISON_USE_PUSH_FOR_PULL modify the
+dnl # behavior of Bison at all when push parsing is already requested.
+b4_define_flag_if([use_push_for_pull])dnl
+b4_use_push_for_pull_if([b4_push_if([m4_define([b4_use_push_for_pull_flag], [[0]])],[m4_define([b4_push_flag], [[1]])])])dnl
+m4_define([b4_both_if],[b4_push_if([b4_pull_if([$1],[$2])],[$2])])dnl
+m4_define([b4_throw_exception],[m4_ifval($1,[throw new $1 ( $2 );],[$3;])])
+
+m4_define([b4_define_state],[[
+    /// Lookahead and lookahead in internal form.
+    int yychar = yyempty_;
+    int yytoken = 0;
+
+    /* State.  */
+    int yyn = 0;
+    int yylen = 0;
+    int yystate = 0;
+    YYStack yystack = new YYStack ();
+
+    /* Error handling.  */
+    int yynerrs_ = 0;
+    ]b4_locations_if([/// The location where the error started.
+    b4_location_type yyerrloc = null;
+
+    /// b4_location_type of the lookahead.
+    b4_location_type yylloc = new b4_location_type (null, null);])[
+
+    /// Semantic value of the lookahead.
+    ]b4_yystype[ yylval = null;
+]])

 b4_output_begin([b4_parser_file_name])
 b4_copyright([Skeleton implementation for Bison LALR(1) parsers in Java],
@@ -344,6 +413,12 @@ b4_lexer_if([[
    * return failure (<tt>false</tt>).  */
   public static final int YYABORT = 1;

+]b4_push_if([
+  /**
+   * Returned by a Bison action in order to stop the parsing process and
+   * return failure (<tt>false</tt>).  */
+  public static final int YYMORE = 4;])[
+
   /**
    * Returned by a Bison action in order to start error recovery without
    * printing an error message.  */
@@ -357,9 +432,13 @@ b4_lexer_if([[
   private static final int YYREDUCE = 6;
   private static final int YYERRLAB1 = 7;
   private static final int YYRETURN = 8;
+]b4_push_if([  private static final int YYGETTOKEN = 9;])[

   private int yyerrstatus_ = 0;

+]b4_push_if([dnl
+  // if using push parser, define state as instance variables
+b4_define_state])[
   /**
* Return whether error recovery is being done. In this state, the parser
    * reads token until it reaches a known state, and then restarts normal
@@ -463,6 +542,8 @@ b4_lexer_if([[
+ (yyvaluep == null ? "(null)" : yyvaluep.toString ()) + ")");
   }

+]b4_push_if([],[
+dnl Define the core pull parse procedure header when pull only
   /**
* Parse input from the scanner that was specified at object construction
    * time.  Return whether the end of the input was reached successfully.
@@ -470,46 +551,58 @@ b4_lexer_if([[
* @@return <tt>true</tt> if the parsing succeeds. Note that this does not
    *          imply that there were no syntax errors.
    */
- public boolean parse () ]b4_maybe_throws([b4_list2([b4_lex_throws], [b4_throws])])[ + public boolean parse () b4_maybe_throws([b4_list2([b4_lex_throws], [b4_throws])])])[
+]b4_push_if([
+  /**
+   * Push Parse input from external lexer
+   *
+   * @@param yylextoken current token
+   * @@param yylexval current lval
+b4_locations_if([   * @@param yylexloc current position])
+   *
+   * @@return <tt>YYACCEPT, YYABORT, YYMORE</tt>
+   */
+b4_locations_if([
+ public int push_parse (int yylextoken, b4_yystype yylexval, b4_location_type yylexloc)],[dnl
+  public int push_parse (int yylextoken, b4_yystype yylexval)])
+      b4_maybe_throws([b4_list2([b4_lex_throws], [b4_throws])])])[
   {
-    /// Lookahead and lookahead in internal form.
-    int yychar = yyempty_;
-    int yytoken = 0;
-
-    /* State.  */
-    int yyn = 0;
-    int yylen = 0;
-    int yystate = 0;
-
-    YYStack yystack = new YYStack ();
-
-    /* Error handling.  */
-    int yynerrs_ = 0;
-    ]b4_locations_if([/// The location where the error started.
-    ]b4_location_type[ yyerrloc = null;
-
-    /// ]b4_location_type[ of the lookahead.
-    ]b4_location_type[ yylloc = new ]b4_location_type[ (null, null);
-
-    /// @@$.
-    ]b4_location_type[ yyloc;])
-
-    /// Semantic value of the lookahead.
-    b4_yystype[ yylval = null;
-
+    ]b4_locations_if([/// @@$.
+    b4_location_type yyloc;])[
+]b4_push_if([],[dnl
+    // if using pull parser only, define state as method variables
+b4_define_state
+    int label = YYNEWSTATE;
     yycdebug ("Starting parse\n");
     yyerrstatus_ = 0;

-]m4_ifdef([b4_initial_action], [
+    /* Initialize the stack.  */
+    yystack.push (yystate, yylval b4_locations_if([, yylloc]));
+m4_ifdef([b4_initial_action], [
 b4_dollar_pushdef([yylval], [], [yylloc])dnl
     /* User initialization code.  */
     b4_user_initial_action
-b4_dollar_popdef])[]dnl
-
-  [  /* Initialize the stack.  */
-    yystack.push (yystate, yylval]b4_locations_if([, yylloc])[);
-
-    int label = YYNEWSTATE;
+b4_dollar_popdef[]dnl
+])
+])[
+]b4_push_if([
+    int label;
+    boolean havenexttoken = true;
+
+    if(!push_parse_initialized) {
+      push_parse_initialize();
+      label = YYNEWSTATE;
+      yycdebug ("Starting parse\n");
+      yyerrstatus_ = 0;
+m4_ifdef([b4_initial_action], [
+b4_dollar_pushdef([yylval], [], [yylloc])dnl
+    /* User initialization code.  */
+    b4_user_initial_action
+b4_dollar_popdef[]dnl
+])
+    } else
+      label = YYGETTOKEN;
+])[
     for (;;)
       switch (label)
       {
@@ -522,7 +615,7 @@ b4_dollar_popdef])[]dnl

         /* Accept?  */
         if (yystate == yyfinal_)
-          return true;
+]b4_push_if([ {label = YYACCEPT; break;}],[ return true;])[

         /* Take a decision.  First try without lookahead.  */
         yyn = yypact_[yystate];
@@ -531,16 +624,29 @@ b4_dollar_popdef])[]dnl
             label = YYDEFAULT;
             break;
           }
+]b4_push_if([        /* Fall Through */

+      case YYGETTOKEN:])[
         /* Read a lookahead token.  */
         if (yychar == yyempty_)
           {
+]b4_push_if([
+            if(!havenexttoken) {
+              return YYMORE;
+            }
+            yycdebug ("Reading a token: ");
+            yychar = yylextoken;
+            yylval = yylexval;
+            b4_locations_if([yylloc = yylexloc;])
+            havenexttoken = false;])[
+]b4_push_if([],[dnl else !push_if
             yycdebug ("Reading a token: ");
-            yychar = yylexer.yylex ();]
-            b4_locations_if([[
-            yylloc = new ]b4_location_type[(yylexer.getStartPos (),
-                            yylexer.getEndPos ());]])
-            yylval = yylexer.getLVal ();[
+            yychar = yylexer.yylex ();
+            yylval = yylexer.getLVal ();
+            b4_locations_if([dnl
+            yylloc = new b4_location_type (yylexer.getStartPos (),
+                            yylexer.getEndPos ());])
+])[
           }

         /* Convert token to internal form.  */
@@ -633,13 +739,13 @@ b4_dollar_popdef])[]dnl
         /* If just tried and failed to reuse lookahead token after an
          error, discard it.  */

-        if (yychar <= Lexer.EOF)
-          {
-          /* Return failure if at end of input.  */
-          if (yychar == Lexer.EOF)
-            return false;
-          }
-        else
+          if (yychar <= Lexer.EOF)
+            {
+            /* Return failure if at end of input.  */
+            if (yychar == Lexer.EOF)
+]b4_push_if([ {label = YYABORT; break;}],[ return false;])[
+            }
+          else
               yychar = yyempty_;
           }

@@ -684,7 +790,7 @@ b4_dollar_popdef])[]dnl

/* Pop the current state because it cannot handle the error token. */
             if (yystack.height == 0)
-              return false;
+]b4_push_if([ {label = YYABORT; break;}],[ return false;])[

             ]b4_locations_if([yyerrloc = yystack.locationAt (0);])[
             yystack.pop ();
@@ -693,6 +799,9 @@ b4_dollar_popdef])[]dnl
               yystack.print (yyDebugStream);
           }

+        if(label == YYABORT)
+            break;/* leave the switch */
+
         ]b4_locations_if([
         /* Muck with the stack to setup for yylloc.  */
         yystack.push (0, null, yylloc);
@@ -711,13 +820,85 @@ b4_dollar_popdef])[]dnl

         /* Accept.  */
       case YYACCEPT:
-        return true;
+]b4_push_if([ push_parse_initialized = false; return YYACCEPT;],[ return true;])[

         /* Abort.  */
       case YYABORT:
-        return false;
+]b4_push_if([ push_parse_initialized = false; return YYABORT;],[ return false;])[
       }
+}
+]b4_push_if([[
+  boolean push_parse_initialized = false;
+
+  public void push_parse_initialize()
+  {
+    // (Re-)Initialize the state
+    /// Lookahead and lookahead in internal form.
+    this.yychar = yyempty_;
+    this.yytoken = 0;
+
+    /* State.  */
+    this.yyn = 0;
+    this.yylen = 0;
+    this.yystate = 0;
+    this.yystack = new YYStack ();
+
+    /* Error handling.  */
+    this.yynerrs_ = 0;
+    ]b4_locations_if([/// The location where the error started.
+    this.yyerrloc = null;
+    this.yylloc = new b4_location_type (null, null);])[
+
+    /// Semantic value of the lookahead.
+    this.yylval = null;
+
+ yystack.push (this.yystate, this.yylval]b4_locations_if([, this.yylloc])[);
+    push_parse_initialized = true;
+  }
+]b4_locations_if([
+  /**
+   * Push Parse input from external lexer
+   *
+   * @@param yylextoken current token
+   * @@param yylexval current lval
+   * @@param yyylexpos current position
+   *
+   * @@return <tt>YYACCEPT, YYABORT, YYMORE</tt>
+   */
+ public int push_parse (int yylextoken, b4_yystype yylexval, b4_position_type yylexpos)
+      b4_maybe_throws([b4_list2([b4_lex_throws], [b4_throws])])
+  {
+ return push_parse (yylextoken, yylexval, new b4_location_type (yylexpos));
   }
+])[]])
+
+b4_both_if([
+dnl Define the core pull parse procedure header when api.push-push=both
+  /**
+   * Parse input from the scanner that was specified at object construction
+   * time.  Return whether the end of the input was reached successfully.
+   *
+ * @@return <tt>true</tt> if the parsing succeeds. Note that this does not
+   *          imply that there were no syntax errors.
+   */
+ public boolean parse () b4_maybe_throws([b4_list2([b4_lex_throws], [b4_throws])])
+   {
+      int status;
+      if(yylexer == null)
+        throw new NullPointerException("Null Lexer");
+      do {
+        int token = yylexer.yylex();
+        b4_yystype lval = yylexer.getLVal();
+b4_locations_if([dnl
+ b4_location_type yyloc = new b4_location_type (yylexer.getStartPos (),
+                                              yylexer.getEndPos ());])
+        this.yyerrstatus_ = 0;
+ b4_locations_if([status = push_parse(token,lval,yyloc);],[dnl else !locations_if
+        status = push_parse(token,lval);])
+      } while (status == YYMORE);
+      return (status == YYACCEPT);
+  }
+])[

   // Generate an error message.
   private String yysyntax_error (int yystate, int tok)
@@ -825,6 +1006,7 @@ b4_dollar_popdef])[]dnl
   ]b4_integral_parser_table_define([rline], [b4_rline],
   [[YYRLINE[YYN] -- Source line where rule number YYN was defined.]])[

+
// Report on the debug stream that the rule yyrule is going to be reduced.
   private void yy_reduce_print (int yyrule, YYStack yystack)
   {
diff --git a/doc/bison.texi b/doc/bison.texi
index 8d7bb43..98deb77 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -358,6 +358,7 @@ Java Parsers
 * Java Parser Interface::       Instantiating and running the parser
 * Java Scanner Interface::      Specifying the scanner for the parser
 * Java Action Features::        Special features for use in actions
+* Java Push Parser Interface::  instantiating and running the a push parser
* Java Differences:: Differences between C/C++ and Java Grammars
 * Java Declarations Summary::   List of Bison declarations used with Java

@@ -11175,6 +11176,7 @@ main (int argc, char *argv[])
 * Java Parser Interface::       Instantiating and running the parser
 * Java Scanner Interface::      Specifying the scanner for the parser
 * Java Action Features::        Special features for use in actions
+* Java Push Parser Interface::  instantiating and running the a push parser
* Java Differences:: Differences between C/C++ and Java Grammars
 * Java Declarations Summary::   List of Bison declarations used with Java
 @end menu
@@ -11566,6 +11568,77 @@ available only if location tracking is active.
 @end deftypefn


address@hidden Java Push Parser Interface
address@hidden Java Push Parser Interface
address@hidden - define push_parse
address@hidden %define api.push-pull
+
+(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
+
+Normally, Bison generates a pull parser for Java.
+The following Bison declaration says that you want the parser to be a push
+parser (@pxref{%define Summary,,api.push-pull}):
+
address@hidden
+%define api.push-pull push
address@hidden example
+
+Most of the discussion about the
+Java pull Parser Interface,
+(@pxref{Java Parser Interface})
+applies to the push parser interface as well.
+
+When generating a push parser, the method @code{push_parse} is
+created with the following signature (depending on if locations are
+enabled).
+
address@hidden {YYParser} {void} push_parse ({int} @var{token}, {Object} @var{yylval}) address@hidden {YYParser} {void} push_parse ({int} @var{token}, {Object} @var{yylval}, {Location} @var{yyloc}) address@hidden {YYParser} {void} push_parse ({int} @var{token}, {Object} @var{yylval}, {Position} @var{yypos})
address@hidden deftypemethod
+
+The primary difference with respect to
+a pull parser is that the parser method
address@hidden is invoked repeatedly to parse each token.
+This function is available if either the
+"%define api.push-pull push" or "%define api.push-pull both"
+declaration is used (@pxref{%define Summary,,api.push-pull}).
+The @code{Location} and @code{Position} parameters are
+available only if location tracking is active.
+
+The value returned by the @code{push_parse} method
+is one of the following four constants:
address@hidden, @code{YYACCEPT}, @code{YYERROR}, or @code{YYMORE}.
+This new value,
address@hidden, may be returned if more input is required to finish
+parsing the grammar.
+
+If api.push-pull is declared as @code{both}, then the generated parser
+class will also implement the @code{parse} method. This method's
+body is a loop that repeatedly invokes the scanner and then
+passes the values obtained from the scanner to the @code{push_parse}
+method.
+
+There is one additional complication.
+Technically, the push parser does not need to know about the scanner
+(i.e. an object implementing the @code{YYParser.Lexer} interface),
+but it does need access to the @code{yyerror} method.
+The current approach (and subject to change) is to require
+the @code{YYParser} constructor to be given an object implementing
+the @code{YYParser.Lexer} interface. This object need only
+implement the @code{yyerror} method; the other methods can be stubbed
+since they will never be invoked.
+The simplest way to do this is to add code like this to your
+.y file.
+
address@hidden
+%code lexer @{
+public Object getLVal() @{return null;@}
+public int yylex() @{return 0;@}
+public void yyerror(String s) @{System.err.println(s);@}
address@hidden
address@hidden example
+
 @node Java Differences
 @subsection Differences between C/C++ and Java Grammars

diff --git a/tests/local.mk b/tests/local.mk
index 7bc8b78..b27b96c 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -53,6 +53,7 @@ TESTSUITE_AT =                                  \
   tests/headers.at                              \
   tests/input.at                                \
   tests/java.at                                 \
+  tests/javapush.at                             \
   tests/local.at                                \
   tests/named-refs.at                           \
   tests/output.at                               \
diff --git a/tests/testsuite.at b/tests/testsuite.at
index f11866b..0225876 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -74,3 +74,4 @@ m4_include([glr-regression.at])

 # Push parsing specific tests.
 m4_include([push.at])
+m4_include([javapush.at])
--
1.7.4.4





reply via email to

[Prev in Thread] Current Thread [Next in Thread]