bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] glr.cc: support syntax_error exceptions


From: Akim Demaille
Subject: Re: [PATCH 0/5] glr.cc: support syntax_error exceptions
Date: Fri, 4 Jan 2019 16:15:43 +0100

Hi Askar,

> Le 3 janv. 2019 à 01:20, Askar Safin <address@hidden> a écrit :
> 
> Hi. I tested 80ef7e7639f99618bee490b2dea02b5fd9ab28e5 and this commit really 
> fixes bug I reported. But it seems the code is still buggy. I tested the same 
> code (i. e. attachments from 
> https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00000.html ) and 
> typed this:
> 
>    echo 2 + 2 % | ./a.out
> 
> And saw this:
> 
>    Invalid character: %
>    4
> 
> And the program returned zero return status (i. e. parser reported to main 
> function that it parsed successfully). But if I replace skeleton with LALR1, 
> I get this:
> 
>    Invalid character: %
> 
> And I get non-zero return status

Good catch, thanks!  I believe this patch is much better than my previous 
failed attempt.

commit 3a0d220a6e38f170332060a1b2e799c9f09e613e
Author: Akim Demaille <address@hidden>
Date:   Thu Jan 3 09:43:36 2019 +0100

    glr.cc: fix the handling of syntax_error from the scanner
    
    Commit 90a8537e6287f92fb3d5be0258a69247a742f12e was right, but issued
    two error messages.  Commit 80ef7e7639f99618bee490b2dea02b5fd9ab28e5
    tried to address that by mapping yychar and yytoken to empty, but that
    completely breaks the invariants of glr.c.  In particular, yygetToken
    can be called repeatedly and is expected to return the latest result,
    unless yytoken is YYEMPTY.  Since the previous attempt was "recording"
    that the token was coming from an exception by setting it to YYEMPTY,
    instead of getting again the faulty token, we fetched another one.
    
    Rather, revert to the first approach: map yytoken to "invalid token",
    but record in yychar the fact that we come from an exception thrown in
    the scanner.
    
    * data/skeletons/glr.c (YYFAULTYTOK): New.
    (yygetToken): Use it to record syntax errors from the scanner.
    * tests/c++.at (Syntax error as exception): In addition to checking
    syntax_error with error recovery, make sure it also behaves as
    expected without.

diff --git a/data/skeletons/glr.c b/data/skeletons/glr.c
index 0d5ccee9..ea838d8f 100644
--- a/data/skeletons/glr.c
+++ b/data/skeletons/glr.c
@@ -327,7 +327,7 @@ static YYLTYPE yyloc_default][]b4_yyloc_default;])[
 #define YYNNTS  ]b4_nterms_number[
 /* YYNRULES -- Number of rules.  */
 #define YYNRULES  ]b4_rules_number[
-/* YYNRULES -- Number of states.  */
+/* YYNSTATES -- Number of states.  */
 #define YYNSTATES  ]b4_states_number[
 /* YYMAXRHS -- Maximum number of symbols on right-hand side of rule.  */
 #define YYMAXRHS ]b4_r2_max[
@@ -335,8 +335,14 @@ static YYLTYPE yyloc_default][]b4_yyloc_default;])[
    accessed by $0, $-1, etc., in any rule.  */
 #define YYMAXLEFT ]b4_max_left_semantic_context[
 
+/* YYMAXUTOK -- Last valid token number (for yychar).  */
+#define YYMAXUTOK   ]b4_user_token_number_max[]b4_glr_cc_if([[
+/* YYFAULTYTOK -- Token number (for yychar) that denotes a
+   syntax_error thrown from the scanner.  */
+#define YYFAULTYTOK (YYMAXUTOK + 1)]])[
+/* YYUNDEFTOK -- Symbol number (for yytoken) that denotes an unknown
+   token.  */
 #define YYUNDEFTOK  ]b4_undef_token_number[
-#define YYMAXUTOK   ]b4_user_token_number_max[
 
 /* YYTRANSLATE(TOKEN-NUM) -- Symbol number corresponding to TOKEN-NUM
    as returned by yylex, with out-of-bounds checking.  */
@@ -782,8 +788,9 @@ yygetToken (int *yycharp][]b4_pure_if([, yyGLRStack* 
yystackp])[]b4_user_formals
           YYDPRINTF ((stderr, "Caught exception: %s\n", 
yyexc.what()));]b4_locations_if([
           yylloc = yyexc.location;])[
           yyerror (]b4_lyyerror_args[yyexc.what ());
-          // Leave yytoken/yychar to YYEMPTY.
-          return YYEMPTY;
+          // Map errors caught in the scanner to the undefined token
+          // (YYUNDEFTOK), so that error handling is started.  However, 
register that
+          *yycharp = YYFAULTYTOK;
         }
 #endif // YY_EXCEPTIONS]], [[
       *yycharp = ]b4_lex[;]])[
@@ -2352,11 +2359,11 @@ b4_dollar_popdef])[]dnl
                 }
               else if (yyisErrorAction (yyaction))
                 {]b4_locations_if([[
-                  yystack.yyerror_range[1].yystate.yyloc = yylloc;]])[
-                  /* If yylex returned no token (YYEMPTY), it already
-                     issued an error message.  */
-                  if (yytoken != YYEMPTY)
-                    yyreportSyntaxError (&yystack]b4_user_args[);
+                  yystack.yyerror_range[1].yystate.yyloc = 
yylloc;]])[]b4_glr_cc_if([[
+                  /* Don't issue an error message again for exception
+                     thrown from the scanner.  */
+                  if (yychar != YYFAULTYTOK)
+  ]])[                  yyreportSyntaxError (&yystack]b4_user_args[);
                   goto yyuser_error;
                 }
               else
diff --git a/tests/c++.at b/tests/c++.at
index 6b6d2e41..caab842a 100644
--- a/tests/c++.at
+++ b/tests/c++.at
@@ -957,14 +957,17 @@ AT_DATA_GRAMMAR([[input.yy]],
 %define parse.trace
 %%
 
-start:
-  thing
-| start thing
+start: with-recovery | '!' without-recovery;
+
+with-recovery:
+  %empty
+| with-recovery item
+| with-recovery error   { std::cerr << "caught error\n"; }
 ;
 
-thing:
-  error   { std::cerr << "caught error\n"; }
-| item
+without-recovery:
+  %empty
+| without-recovery item
 ;
 
 item:
@@ -988,17 +991,15 @@ yy::parser::error (const std::string &m)
 AT_DATA_SOURCE([scan.cc],
 [[#include "input.hh"
 
+// 'a': valid item, 's': syntax error, 'l': lexical error.
 int
-yylex (yy::parser::semantic_type *)
+yylex (yy::parser::semantic_type *lval)
 {
-  // 's': syntax error, 'l': lexical error.
-  //
-  // Leave enough valid tokens to make sure we recovered from the
-  // previous error, otherwise we might hide some error messages
-  // (discarded during error recovery).
-  static char const *input = "asaaalaa";
-  switch (int res = *input++)
+  switch (int res = getchar ())
   {
+    // Don't choke on echo's \n.
+    case '\n':
+      return yylex (lval);
     case 'l':
       throw yy::parser::syntax_error ("invalid character");
     default:
@@ -1010,15 +1011,27 @@ yylex (yy::parser::semantic_type *)
 AT_BISON_CHECK([[-o input.cc input.yy]])
 
 AT_FOR_EACH_CXX([
-AT_LANG_COMPILE([[input]], [[input.cc scan.cc]])
+  AT_LANG_COMPILE([[input]], [[input.cc scan.cc]])
 
-AT_PARSER_CHECK([[./input]], [[0]], [[]],
+  # Leave enough valid tokens to make sure we recovered from the
+  # previous error, otherwise we might hide some error messages
+  # (discarded during error recovery).
+  AT_PARSER_CHECK([[echo "asaaalaa" | ./input ]], [[0]], [[]],
 [[error: invalid expression
 caught error
 error: invalid character
 caught error
 ]])
-])
+
+  AT_PARSER_CHECK([[echo "!as" | ./input ]], [1], [],
+[[error: invalid expression
+]])
+
+  AT_PARSER_CHECK([[echo "!al" | ./input ]], [1], [],
+[[error: invalid character
+]])
+
+]) # AT_FOR_EACH_CXX
 
 AT_BISON_OPTION_POPDEFS
 AT_CLEANUP
@@ -1029,6 +1042,8 @@ AT_TEST([%skeleton "glr.cc"])
 
 m4_popdef([AT_TEST])
 
+
+
 ## ------------------ ##
 ## Exception safety.  ##
 ## ------------------ ##




reply via email to

[Prev in Thread] Current Thread [Next in Thread]