bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: help me with Java enums please


From: Akim Demaille
Subject: RFC: help me with Java enums please
Date: Sat, 4 Apr 2020 09:12:27 +0200

Hi all,

Bison 3.6 brings a few important changes in all the skeletons, but I am no Java 
programmer, and there are things I don't know how to do.  Because it has a 
significant impact on the API, I would like to avoid stupid mistakes with 
which, because of backward compatibility, we might have to live for a long time.

So _please_, if you are knowledgeable in Java, help me!

One of the most significant changes in Bison 3.6 is that we move from plain int 
types to store the symbol kind (PLUS, LPAREN, NUMBER, etc.) to an enum.  The 
motivation is that before the symbol kinds were hidden from the user, but now 
they have to be exposed for the real new feature of 3.6: customized error 
messages.

Let me take examples/java/calc.y as an example:

> %token
>     BANG   "!"
>     PLUS   "+"
>     MINUS  "-"
>     STAR   "*"
>     SLASH  "/"
>     CARET  "^"
>     LPAREN "("
>     RPAREN ")"
>     EQUAL  "="
>     EOL    _("end of line")
>   <Integer>
>     NUM    _("number")


This generates the following enum (don't pay too much attention to its name, 
it's likely to change from SymbolType to symbolKind):

>     public enum SymbolType
>     {
>         YYSYMBOL_YYEMPTY(-2),
>         YYSYMBOL_YYEOF(0),
>         YYSYMBOL_YYERROR(1),
>         YYSYMBOL_YYUNDEF(2),
>         YYSYMBOL_BANG(3),
>         YYSYMBOL_PLUS(4),
>         YYSYMBOL_MINUS(5),
>         YYSYMBOL_STAR(6),
>         YYSYMBOL_SLASH(7),
>         YYSYMBOL_CARET(8),
>         YYSYMBOL_LPAREN(9),
>         YYSYMBOL_RPAREN(10),
>         YYSYMBOL_EQUAL(11),
>         YYSYMBOL_EOL(12),
>         YYSYMBOL_NUM(13),
>         YYSYMBOL_NEG(14),
>         YYSYMBOL_YYACCEPT(15),
>         YYSYMBOL_input(16),
>         YYSYMBOL_line(17),
>         YYSYMBOL_exp(18),
>         YYNTOKENS(15); ///< Number of tokens.
> 
>         private int code;
> 
>         SymbolType (int n) {
>             this.code = n;
>         }
>         static SymbolType get (int code) {
>             switch (code) {
>             default: return YYSYMBOL_YYUNDEF;
>             case 0: return SymbolType.YYSYMBOL_YYEOF;
>             case 1: return SymbolType.YYSYMBOL_YYERROR;
>             case 2: return SymbolType.YYSYMBOL_YYUNDEF;
>             case 3: return SymbolType.YYSYMBOL_BANG;
>             case 4: return SymbolType.YYSYMBOL_PLUS;
>             case 5: return SymbolType.YYSYMBOL_MINUS;
>             case 6: return SymbolType.YYSYMBOL_STAR;
>             case 7: return SymbolType.YYSYMBOL_SLASH;
>             case 8: return SymbolType.YYSYMBOL_CARET;
>             case 9: return SymbolType.YYSYMBOL_LPAREN;
>             case 10: return SymbolType.YYSYMBOL_RPAREN;
>             case 11: return SymbolType.YYSYMBOL_EQUAL;
>             case 12: return SymbolType.YYSYMBOL_EOL;
>             case 13: return SymbolType.YYSYMBOL_NUM;
>             case 14: return SymbolType.YYSYMBOL_NEG;
>             case 15: return SymbolType.YYSYMBOL_YYACCEPT;
>             case 16: return SymbolType.YYSYMBOL_input;
>             case 17: return SymbolType.YYSYMBOL_line;
>             case 18: return SymbolType.YYSYMBOL_exp;
> 
>             }
>         }
>         int getCode () {
>             return this.code;
>         }
>     };

So it's kind of heavy weight: we are carrying a fully blown object for each 
single instance of SymbolType.  And we have to use the class method 
SymbolType.get() to find the enum from its numerical code:

>   private static final SymbolType yytranslate_ (int t)
>   {
>     int user_token_number_max_ = 269;
>     if (t <= 0)
>       return SymbolType.YYSYMBOL_YYEOF;
>     else if (t <= user_token_number_max_)
>       return SymbolType.get (yytranslate_table_[t]);   // ****************
>     else
>       return SymbolType.YYSYMBOL_YYUNDEF;
>   }


and to use the object method SymbolType.getCode() to find the numerical code of 
an enum.

>         yyn += yytoken.getCode ();
>         if (yyn < 0 || yylast_ < yyn || yycheck_[yyn] != yytoken.getCode ())
>           label = YYDEFAULT;


The JVMs are known to be quite good at optimizing, but I have no idea whether 
this abstraction will be zero-cost or not.

So I would definitely need some feedback.  Is this the right move for Java?  Or 
should we stick to some plain integral type (on which we would have no type 
checking).

Obviously the `YYSYMBOL_` prefix in the enums seems redundant.  That detail 
will be taken care of later, don't pay attention to this now either.

I have create a PR (https://github.com/akimd/bison/pull/34) for those who want 
to play with the branch.

Cheers!


commit 6a26b99dc6f35f5a12d31fac345ace64f641fda2
Author: Akim Demaille <address@hidden>
Date:   Mon Mar 30 07:45:01 2020 +0200

    java: use SymbolType
    
    The Java enums are very different from the C model.  As a consequence,
    one cannot "build" an enum directly from an integer, we must retrieve
    it.  That's the purpose of the SymbolType.get class method.
    
    * data/skeletons/java.m4 (b4_symbol_enum, b4_case_code_symbol)
    (b4_declare_symbol_enum): New.
    * data/skeletons/lalr1.java: Use SymbolType,
    SymbolType.YYSYMBOL_YYEMPTY, etc.
    * examples/java/calc/Calc.y, tests/local.at: Adjust.

diff --git a/TODO b/TODO
index f314682a..5558c2a3 100644
--- a/TODO
+++ b/TODO
@@ -5,6 +5,9 @@
 - YYNOMEM
 - i18n in Java
 
+** Java
+Check api.token.raw
+
 ** Naming conventions
 yysyntax_error_arguments should be yy_syntax_error_arguments, since it's a
 private implementation detail.
diff --git a/data/skeletons/java.m4 b/data/skeletons/java.m4
index 6b8fe514..e33803dd 100644
--- a/data/skeletons/java.m4
+++ b/data/skeletons/java.m4
@@ -128,9 +128,9 @@ m4_define([b4_integral_parser_table_define],
 [b4_typed_parser_table_define([b4_int_type_for([$2])], [$1], [$2], [$3])])
 
 
-## ------------------------- ##
-## Assigning token numbers.  ##
-## ------------------------- ##
+## -------------------------- ##
+## (External) token numbers.  ##
+## -------------------------- ##
 
 # b4_token_enum(TOKEN-NUM)
 # ------------------------
@@ -147,7 +147,58 @@ m4_define([b4_token_enums],
 [b4_any_token_visible_if([/* Tokens.  */
 b4_symbol_foreach([b4_token_enum])])])
 
-# b4-case(ID, CODE)
+
+
+## --------------------------- ##
+## (Internal) symbol numbers.  ##
+## --------------------------- ##
+
+# b4_symbol_enum(SYMBOL-NUM)
+# --------------------------
+# Output the definition of this symbol as an enum.
+m4_define([b4_symbol_enum],
+[m4_ifval(b4_symbol([$1], [sid]),
+         [m4_format([[%s(%s)]],
+                    b4_symbol([$1], [sid]),
+                    b4_symbol([$1], [number]))])])
+
+
+m4_define([b4_case_code_symbol],
+[[        case $1: return SymbolType.]b4_symbol([$1], [sid]);
+])
+
+# b4_declare_symbol_enum
+# ----------------------
+# The definition of the symbol internal numbers as an enum.
+m4_define([b4_declare_symbol_enum],
+[[  public enum SymbolType
+  {
+    ]m4_join([,
+    ],
+             ]b4_symbol_sid([-2])[(-2),
+             b4_symbol_map([b4_symbol_enum]),
+             [YYNTOKENS(]b4_tokens_number[); ///< Number of tokens.])[;
+
+    private int code;
+
+    SymbolType (int n) {
+      this.code = n;
+    }
+    static SymbolType get (int code) {
+      switch (code) {
+        default: return YYSYMBOL_YYUNDEF;
+]b4_symbol_foreach([b4_case_code_symbol])[
+      }
+    }
+    int getCode () {
+      return this.code;
+    }
+  };
+]])])
+
+
+
+# b4_case(ID, CODE)
 # -----------------
 # We need to fool Java's stupid unreachable code detection.
 m4_define([b4_case],
@@ -157,6 +208,7 @@ m4_define([b4_case],
   break;
 ])
 
+
 # b4_predicate_case(LABEL, CONDITIONS)
 # ------------------------------------
 m4_define([b4_predicate_case],
diff --git a/data/skeletons/lalr1.java b/data/skeletons/lalr1.java
index 195fed1a..fe3a135a 100644
--- a/data/skeletons/lalr1.java
+++ b/data/skeletons/lalr1.java
@@ -55,7 +55,7 @@ b4_use_push_for_pull_if([
 m4_define([b4_define_state],[[
     /* Lookahead and lookahead in internal form.  */
     int yychar = yyempty_;
-    int yytoken = 0;
+    SymbolType yytoken = SymbolType.YYSYMBOL_YYEMPTY;
 
     /* State.  */
     int yyn = 0;
@@ -171,6 +171,8 @@ import java.text.MessageFormat;
       return new ]b4_location_type[ (rhs.locationAt (0).end);
   }]])[
 
+]b4_declare_symbol_enum[
+
   /**
    * Communication interface between the scanner and the Bison-generated
    * parser <tt>]b4_parser_class[</tt>.
@@ -482,7 +484,7 @@ import java.text.MessageFormat;
         default: break;
       }]b4_parse_trace_if([[
 
-    yySymbolPrint ("-> $$ =", yyr1_[yyn], yyval]b4_locations_if([, 
yyloc])[);]])[
+    yySymbolPrint ("-> $$ =", SymbolType.get (yyr1_[yyn]), 
yyval]b4_locations_if([, yyloc])[);]])[
 
     yystack.pop (yylen);
     yylen = 0;
@@ -497,11 +499,11 @@ import java.text.MessageFormat;
   | Print this symbol on YYOUTPUT.  |
   `--------------------------------*/
 
-  private void yySymbolPrint (String s, int yytype,
+  private void yySymbolPrint (String s, SymbolType yytype,
                              ]b4_yystype[ yyvaluep]dnl
                               b4_locations_if([, Object yylocationp])[)
   {
-    yycdebug (s + (yytype < yyntokens_ ? " token " : " nterm ")
+    yycdebug (s + (yytype.getCode () < yyntokens_ ? " token " : " nterm ")
               + yysymbolName (yytype) + " ("]b4_locations_if([
               + yylocationp + ": "])[
               + (yyvaluep == null ? "(null)" : yyvaluep.toString ()) + ")");
@@ -611,8 +613,8 @@ b4_dollar_popdef[]dnl
 
         /* If the proper action on seeing token YYTOKEN is to reduce or to
            detect an error, take that action.  */
-        yyn += yytoken;
-        if (yyn < 0 || yylast_ < yyn || yycheck_[yyn] != yytoken)
+        yyn += yytoken.getCode ();
+        if (yyn < 0 || yylast_ < yyn || yycheck_[yyn] != yytoken.getCode ())
           label = YYDEFAULT;
 
         /* <= 0 means reduce or error.  */
@@ -676,7 +678,7 @@ b4_dollar_popdef[]dnl
           {
             ++yynerrs;
             if (yychar == yyempty_)
-              yytoken = yyempty_;
+              yytoken = SymbolType.YYSYMBOL_YYEMPTY;
             yyreportSyntaxError (new Context (yystack, 
yytoken]b4_locations_if([[, yylloc]])[));
           }
 
@@ -726,8 +728,9 @@ b4_dollar_popdef[]dnl
             yyn = yypact_[yystate];
             if (!yyPactValueIsDefault (yyn))
               {
-                yyn += yy_error_token_;
-                if (0 <= yyn && yyn <= yylast_ && yycheck_[yyn] == 
yy_error_token_)
+                yyn += SymbolType.YYSYMBOL_YYERROR.getCode ();
+                if (0 <= yyn && yyn <= yylast_
+                    && yycheck_[yyn] == SymbolType.YYSYMBOL_YYERROR.getCode ())
                   {
                     yyn = yytable_[yyn];
                     if (0 < yyn)
@@ -760,7 +763,7 @@ b4_dollar_popdef[]dnl
         yystack.pop (2);]])[
 
         /* Shift the error token.  */]b4_parse_trace_if([[
-        yySymbolPrint ("Shifting", yystos_[yyn],
+        yySymbolPrint ("Shifting", SymbolType.get (yystos_[yyn]),
                        yylval]b4_locations_if([, yyloc])[);]])[
 
         yystate = yyn;
@@ -789,7 +792,7 @@ b4_dollar_popdef[]dnl
   {
     /* Lookahead and lookahead in internal form.  */
     this.yychar = yyempty_;
-    this.yytoken = 0;
+    this.yytoken = SymbolType.YYSYMBOL_YYEMPTY;
 
     /* State.  */
     this.yyn = 0;
@@ -861,7 +864,7 @@ b4_dollar_popdef[]dnl
    */
   public static final class Context
   {
-    Context (YYStack stack, int token]b4_locations_if([[, ]b4_location_type[ 
loc]])[)
+    Context (YYStack stack, SymbolType token]b4_locations_if([[, 
]b4_location_type[ loc]])[)
     {
       yystack = stack;
       yytoken = token;]b4_locations_if([[
@@ -870,7 +873,7 @@ b4_dollar_popdef[]dnl
 
     private YYStack yystack;
 
-    public int getToken ()
+    public SymbolType getToken ()
     {
       return yytoken;
     }
@@ -880,7 +883,7 @@ b4_dollar_popdef[]dnl
      */
     public static final int EMPTY = ]b4_parser_class[.yyempty_;
 
-    private int yytoken;]b4_locations_if([[
+    private SymbolType yytoken;]b4_locations_if([[
     public ]b4_location_type[ getLocation ()
     {
       return yylocation;
@@ -893,12 +896,12 @@ b4_dollar_popdef[]dnl
        current YYCTX, and return the number of tokens stored in YYARG.  If
        YYARG is null, return the number of expected tokens (guaranteed to
        be less than YYNTOKENS).  */
-    int yyexpectedTokens (int yyarg[], int yyargn)
+    int yyexpectedTokens (SymbolType yyarg[], int yyargn)
     {
       return yyexpectedTokens (yyarg, 0, yyargn);
     }
 
-    int yyexpectedTokens (int yyarg[], int yyoffset, int yyargn)
+    int yyexpectedTokens (SymbolType yyarg[], int yyoffset, int yyargn)
     {
       int yycount = yyoffset;
       int yyn = yypact_[this.yystack.stateAt (0)];
@@ -912,16 +915,16 @@ b4_dollar_popdef[]dnl
           /* Stay within bounds of both yycheck and yytname.  */
           int yychecklim = yylast_ - yyn + 1;
           int yyxend = yychecklim < NTOKENS ? yychecklim : NTOKENS;
-          for (int x = yyxbegin; x < yyxend; ++x)
-            if (yycheck_[x + yyn] == x && x != yy_error_token_
-                && !yyTableValueIsError (yytable_[x + yyn]))
+          for (int yyx = yyxbegin; yyx < yyxend; ++yyx)
+            if (yycheck_[yyx + yyn] == yyx && yyx != 
SymbolType.YYSYMBOL_YYERROR.getCode ()
+                && !yyTableValueIsError (yytable_[yyx + yyn]))
               {
                 if (yyarg == null)
                   yycount += 1;
                 else if (yycount == yyargn)
                   return 0; // FIXME: this is incorrect.
                 else
-                  yyarg[yycount++] = x;
+                  yyarg[yycount++] = SymbolType.get (yyx);
               }
         }
       return yycount - yyoffset;
@@ -929,7 +932,7 @@ b4_dollar_popdef[]dnl
 
     /* The user-facing name of the symbol whose (internal) number is
        YYSYMBOL.  No bounds checking.  */
-    static String yysymbolName (int yysymbol)
+    static String yysymbolName (SymbolType yysymbol)
     {
       return ]b4_parser_class[.yysymbolName (yysymbol);
     }
@@ -937,7 +940,7 @@ b4_dollar_popdef[]dnl
 
 ]b4_parse_error_bmatch(
 [detailed\|verbose], [[
-  private int yysyntaxErrorArguments (Context yyctx, int[] yyarg, int yyargn)
+  private int yysyntaxErrorArguments (Context yyctx, SymbolType[] yyarg, int 
yyargn)
   {
     /* There are many possibilities here to consider:
        - If this state is a consistent state with a default action,
@@ -966,7 +969,7 @@ b4_dollar_popdef[]dnl
          to an error action in a later state.
     */
     int yycount = 0;
-    if (yyctx.getToken () != yyempty_)
+    if (yyctx.getToken () != SymbolType.YYSYMBOL_YYEMPTY)
       {
         yyarg[yycount++] = yyctx.getToken ();
         yycount += yyctx.yyexpectedTokens (yyarg, 1, yyargn);
@@ -986,7 +989,7 @@ b4_dollar_popdef[]dnl
     if (yyErrorVerbose)
       {
         final int argmax = 5;
-        int[] yyarg = new int[argmax];
+        SymbolType[] yyarg = new SymbolType[argmax];
         int yycount = yysyntaxErrorArguments (yyctx, yyarg, argmax);
         String[] yystr = new String[yycount];
         for (int yyi = 0; yyi < yycount; ++yyi)
@@ -1077,21 +1080,21 @@ b4_dollar_popdef[]dnl
 
   /* The user-facing name of the symbol whose (internal) number is
      YYSYMBOL.  No bounds checking.  */
-  static String yysymbolName (int yysymbol)
+  static String yysymbolName (SymbolType yysymbol)
   {
-    return yytnamerr_ (yytname_[yysymbol]);
+    return yytnamerr_ (yytname_[yysymbol.getCode ()]);
   }
 ]],
         [custom\|detailed],
 [[  /* The user-facing name of the symbol whose (internal) number is
      YYSYMBOL.  No bounds checking.  */
-  static String yysymbolName (int yysymbol)
+  static String yysymbolName (SymbolType yysymbol)
   {
     String[] yy_sname =
     {
     ]b4_symbol_names[
     };
-    return yy_sname[yysymbol];
+    return yy_sname[yysymbol.getCode ()];
   }]])[
 
 ]b4_parse_trace_if([[
@@ -1114,35 +1117,31 @@ b4_dollar_popdef[]dnl
     /* The symbols being reduced.  */
     for (int yyi = 0; yyi < yynrhs; yyi++)
       yySymbolPrint ("   $" + (yyi + 1) + " =",
-                     yystos_[yystack.stateAt (yynrhs - (yyi + 1))],
+                     SymbolType.get (yystos_[yystack.stateAt (yynrhs - (yyi + 
1))]),
                      ]b4_rhs_data(yynrhs, yyi + 1)b4_locations_if([,
                      b4_rhs_location(yynrhs, yyi + 1)])[);
   }]])[
 
   /* YYTRANSLATE_(TOKEN-NUM) -- Symbol number corresponding to TOKEN-NUM
      as returned by yylex, with out-of-bounds checking.  */
-  private static final ]b4_int_type_for([b4_translate])[ yytranslate_ (int t)
+  private static final SymbolType yytranslate_ (int t)
 ]b4_api_token_raw_if(dnl
 [[  {
-    return t;
+    return SymbolType.get (t);
   }
 ]],
 [[  {
     int user_token_number_max_ = ]b4_user_token_number_max[;
-    ]b4_int_type_for([b4_translate])[ undef_token_ = ]b4_undef_token_number[;
-
     if (t <= 0)
-      return Lexer.EOF;
+      return SymbolType.YYSYMBOL_YYEOF;
     else if (t <= user_token_number_max_)
-      return yytranslate_table_[t];
+      return SymbolType.get (yytranslate_table_[t]);
     else
-      return undef_token_;
+      return SymbolType.YYSYMBOL_YYUNDEF;
   }
   ]b4_integral_parser_table_define([translate_table], [b4_translate])[
 ]])[
 
-  private static final ]b4_int_type_for([b4_translate])[ yy_error_token_ = 1;
-
   private static final int yylast_ = ]b4_last[;
   private static final int yynnts_ = ]b4_nterms_number[;
   private static final int yyempty_ = -2;
diff --git a/examples/java/calc/Calc.y b/examples/java/calc/Calc.y
index dc502671..7232502e 100644
--- a/examples/java/calc/Calc.y
+++ b/examples/java/calc/Calc.y
@@ -112,15 +112,15 @@ class CalcLexer implements Calc.Lexer {
     System.err.print (ctx.getLocation () + ": syntax error");
     {
       final int TOKENMAX = 10;
-      int[] arg = new int[TOKENMAX];
+      Calc.SymbolType[] arg = new Calc.SymbolType[TOKENMAX];
       int n = ctx.yyexpectedTokens (arg, TOKENMAX);
       for (int i = 0; i < n; ++i)
         System.err.print ((i == 0 ? ": expected " : " or ")
                           + ctx.yysymbolName (arg[i]));
     }
     {
-      int lookahead = ctx.getToken ();
-      if (lookahead != ctx.EMPTY)
+      Calc.SymbolType lookahead = ctx.getToken ();
+      if (lookahead != Calc.SymbolType.YYSYMBOL_YYEMPTY)
         System.err.print (" before " + ctx.yysymbolName (lookahead));
     }
     System.err.println ("");
diff --git a/tests/local.at b/tests/local.at
index 770480da..abba2458 100644
--- a/tests/local.at
+++ b/tests/local.at
@@ -975,12 +975,12 @@ m4_define([AT_YYERROR_DEFINE(java)],
     System.err.print (]AT_LOCATION_IF([[ctx.getLocation () + ": "]]
                       + )["syntax error");
     {
-      int token = ctx.getToken ();
-      if (token != ctx.EMPTY)
+      Calc.SymbolType token = ctx.getToken ();
+      if (token != Calc.SymbolType.YYSYMBOL_YYEMPTY)
         System.err.print (" on token @<:@" + ctx.yysymbolName (token) + 
"@:>@");
     }
     {
-      int[] arg = new int[ctx.NTOKENS];
+      Calc.SymbolType[] arg = new Calc.SymbolType[ctx.NTOKENS];
       int n = ctx.yyexpectedTokens (arg, ctx.NTOKENS);
       if (0 < n)
         {




reply via email to

[Prev in Thread] Current Thread [Next in Thread]