bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escape


From: Alan Mackenzie
Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
Date: Thu, 19 Nov 2020 21:18:22 +0000

Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

OK, here's the patch.  As a matter of interest, it's been heavily tested
by the .../test/src/syntax-tests.el unit tests, further enhancements to
which are part of the patch.

Just as a reminder, the motivation is to be able to have syntax.c
correctly parse C/C++ line comments which look like:

    foo(); // comment \\
    second line of comment.

by introducing a new syntax flag "e" as a modifier on the syntax entry
for \n:

    (modify-syntax-entry ?\n "> be")

>         Stefan



diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..c701729ba1 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags)
 {
   return (flags >> 22) & 1;
 }
+static bool
+SYNTAX_FLAGS_COMMENT_ESCAPES (int flags)
+{
+  return (flags >> 24) & 1;
+}
 
 /* FLAGS should be the flags of the main char of the comment marker, e.g.
    the second for comstart and the first for comend.  */
@@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte)
   return val;
 }
 
+static bool
+comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax)
+{
+  int c;
+  int next_syntax;
+  if (comment_end_can_be_escaped && char_quoted (from, from_byte))
+    return true;
+  if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax))
+    {
+      dec_both (&from, &from_byte);
+      UPDATE_SYNTAX_TABLE_BACKWARD (from);
+      c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+      next_syntax = SYNTAX_WITH_FLAGS (c);
+      UPDATE_SYNTAX_TABLE_FORWARD (from + 1);
+      if (next_syntax == Sescape || next_syntax == Scharquote)
+        return true;
+    }
+  return false;
+}
+
 /* Check whether charpos FROM is at the end of a comment.
    FROM_BYTE is the bytepos corresponding to FROM.
    Do not move back before STOP.
@@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, 
ptrdiff_t stop,
                 && SYNTAX_FLAGS_COMEND_SECOND (prev_syntax));
       comstart = (com2start || code == Scomment);
 
+      /* Check for any current delimiter being escaped.  */
+      if (from > stop
+          && (((com2end || code == Sendcomment)
+               && comment_ender_quoted (from, from_byte, syntax))
+              || (code == Scomment
+                  && comment_end_can_be_escaped
+                  && char_quoted (from, from_byte))))
+        {
+          dec_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_BACKWARD (from);
+          com2end = comstart = com2start = 0;
+          syntax = Smax;
+        }
+
       /* Nasty cases with overlapping 2-char comment markers:
         - snmp-mode: -- c -- foo -- c --
                      --- c --
@@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property.  */)
       case 'c':
        val |= 1 << 23;
        break;
+
+      case 'e':
+        val |= 1 << 24;
+        break;
       }
 
   if (val < ASIZE (Vsyntax_code_object) && NILP (match))
@@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", 
Finternal_describe_syntax_value,
   (Lisp_Object syntax)
 {
   int code, syntax_code;
-  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested;
+  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested,
+    comescapes;
   char str[2];
   Lisp_Object first, match_lisp, value = syntax;
 
@@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", 
Finternal_describe_syntax_value,
   comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code);
   comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code);
   comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code);
+  comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code);
 
   if (Smax <= code)
     {
@@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", 
Finternal_describe_syntax_value,
     insert ("c", 1);
   if (comnested)
     insert ("n", 1);
+  if (comescapes)
+    insert ("e", 1);
 
   insert_string ("\twhich means: ");
 
@@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", 
Finternal_describe_syntax_value,
     insert_string (" (comment style c)");
   if (comnested)
     insert_string (" (nestable)");
+  if (comescapes)
+    insert_string (" (can be escaped)");
 
   if (prefix)
     {
@@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, 
ptrdiff_t stop,
          && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
          && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
              (nesting > 0 && --nesting == 0) : nesting < 0)
-          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+          && !comment_ender_quoted (from, from_byte, syntax))
        /* We have encountered a comment end of the same style
           as the comment sequence which began this comment
           section.  */
@@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, 
ptrdiff_t stop,
        /* We have encountered a nested comment of the same style
           as the comment sequence which began this comment section.  */
        nesting++;
-      if (comment_end_can_be_escaped
-          && (code == Sescape || code == Scharquote))
+      if (SYNTAX_FLAGS_COMEND_FIRST (syntax)
+          && comment_ender_quoted (from, from_byte, syntax))
         {
           inc_both (&from, &from_byte);
           UPDATE_SYNTAX_TABLE_FORWARD (from);
-          if (from == stop) continue; /* Failure */
+          continue;
         }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil.  */)
       /* We're at the start of a comment.  */
       found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
                            &out_charpos, &out_bytepos, &dummy, &dummy2);
-      from = out_charpos; from_byte = out_bytepos;
-      if (!found)
+      from = out_charpos; from_byte = out_bytepos; 
+     if (!found)
        {
          SET_PT_BOTH (from, from_byte);
          return Qnil;
@@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil.  */)
          if (code == Sendcomment)
            comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0);
          if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax)
-             && prev_char_comend_first (from, from_byte)
-             && !char_quoted (from - 1, dec_bytepos (from_byte)))
+             && prev_char_comend_first (from, from_byte))
            {
              int other_syntax;
-             /* We must record the comment style encountered so that
+              /* We must record the comment style encountered so that
                 later, we can match only the proper comment begin
                 sequence of the same style.  */
              dec_both (&from, &from_byte);
-             code = Sendcomment;
-             /* Calling char_quoted, above, set up global syntax position
-                at the new value of FROM.  */
              c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
              other_syntax = SYNTAX_WITH_FLAGS (c1);
-             comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
-             comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              if (!comment_ender_quoted (from, from_byte, other_syntax))
+                {
+                  code = Sendcomment;
+                  comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
+                  comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+                  syntax = other_syntax;
+                }
+              else
+                {
+                  inc_both (&from, &from_byte);
+                  UPDATE_SYNTAX_TABLE_FORWARD (from);
+                }
            }
 
          if (code == Scomment_fence)
@@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil.  */)
            }
          else if (code == Sendcomment)
            {
-              found = (!quoted || !comment_end_can_be_escaped)
+              found =
+                !comment_ender_quoted (from, from_byte, syntax)
                 && back_comment (from, from_byte, stop, comnested, comstyle,
                                  &out_charpos, &out_bytepos);
              if (!found)
@@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT 
depth, bool sexpflag)
              other_syntax = SYNTAX_WITH_FLAGS (c2);
              comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
              comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              syntax = other_syntax;
            }
 
          /* Quoting turns anything except a comment-ender
@@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT 
depth, bool sexpflag)
            case Sendcomment:
              if (!parse_sexp_ignore_comments)
                break;
-             found = back_comment (from, from_byte, stop, comnested, comstyle,
+             found =
+                (from == stop
+                 || !comment_ender_quoted (from, from_byte, syntax))
+                && back_comment (from, from_byte, stop, comnested, comstyle,
                                    &out_charpos, &out_bytepos);
              /* FIXME:  if !found, it really wasn't a comment-end.
                 For single-char Sendcomment, we can't do much about it apart
diff --git a/test/src/syntax-resources/syntax-comments.txt 
b/test/src/syntax-resources/syntax-comments.txt
index a292d816b9..f3357ea244 100644
--- a/test/src/syntax-resources/syntax-comments.txt
+++ b/test/src/syntax-resources/syntax-comments.txt
@@ -34,7 +34,7 @@
 54{ //74 \
 }54
 55{/* */}55
-56{ /*76 \*/ }56
+56{ /*76 \*/80 }56
 57*/77
 58}58
 60{ /*78 \\*/79}60
@@ -87,6 +87,21 @@
 110
 111#| ; |#111
 
+/* Comments and purported comments containing string delimiters. */
+120/* "string" */120
+121/* "" */121
+122/* " */122
+130/*
+" " */130
+" "*/123
+124/* " ' */124
+126/*
+" ' */126
+127/* " " " " " */127
+128/* " ' "  ' " ' */128
+129/*   ' "  ' " ' */129
+" ' */125
+
 Local Variables:
 mode: fundamental
 eval: (set-syntax-table (make-syntax-table))
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index edee01ec58..399986c31d 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -307,6 +307,7 @@ syntax-pps-comments
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun {-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?{ "<")
   (modify-syntax-entry ?} ">"))
@@ -336,6 +337,7 @@ {-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \;-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?\n ">")
   (modify-syntax-entry ?\; "<")
@@ -375,6 +377,7 @@ \;-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \#|-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (modify-syntax-entry ?# ". 14")
   (modify-syntax-entry ?| ". 23n")
   (modify-syntax-entry ?\; "< b")
@@ -418,15 +421,18 @@ \#|-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun /*-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped t)
   (modify-syntax-entry ?/ ". 124b")
   (modify-syntax-entry ?* ". 23")
-  (modify-syntax-entry ?\n "> b"))
+  (modify-syntax-entry ?\n "> b")
+  (modify-syntax-entry ?\' "\""))
 (defun /*-out ()
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?/ ".")
   (modify-syntax-entry ?* ".")
-  (modify-syntax-entry ?\n " "))
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
 (eval-and-compile
   (setq syntax-comments-section "c"))
 
@@ -489,4 +495,142 @@ /*-out
 (syntax-pps-comments /* 56 76 77 58)
 (syntax-pps-comments /* 60 78 79)
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the
+;; "e" flag is used for line comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(defun //-in ()
+  (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
+  (modify-syntax-entry ?/ ". 124be")
+  (modify-syntax-entry ?* ". 23")
+  (modify-syntax-entry ?\n "> be")
+  (modify-syntax-entry ?\' "\""))
+(defun //-out ()
+  (modify-syntax-entry ?/ ".")
+  (modify-syntax-entry ?* ".")
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
+(eval-and-compile
+  (setq syntax-comments-section "c++"))
+
+(syntax-comments // forward t 1)
+(syntax-comments // backward t 1)
+(syntax-comments // forward t 2)
+(syntax-comments // backward t 2)
+(syntax-comments // forward t 3)
+(syntax-comments // backward t 3)
+
+(syntax-comments // forward t 4)
+(syntax-comments // backward t 4)
+(syntax-comments // forward t 5 6)
+(syntax-comments // backward nil 5 0)
+(syntax-comments // forward nil 6 0)
+(syntax-comments // backward t 6 5)
+
+(syntax-comments // forward t 7)
+(syntax-comments // backward t 7)
+(syntax-comments // forward nil 8 0)
+(syntax-comments // backward nil 8 0)
+(syntax-comments // forward t 9)
+(syntax-comments // backward t 9)
+
+(syntax-comments // forward nil 10 0)
+(syntax-comments // backward nil 10 0)
+(syntax-comments // forward t 11)
+(syntax-comments // backward t 11)
+
+(syntax-comments // forward t 13)
+(syntax-comments // backward t 13)
+(syntax-comments // forward t 15)
+(syntax-comments // backward t 15)
+
+;; Emacs 28 "C" style comments inside brace lists.
+(syntax-br-comments // forward t 50)
+(syntax-br-comments // backward t 50)
+(syntax-br-comments // forward t 51)
+(syntax-br-comments // backward t 51)
+(syntax-br-comments // forward t 52)
+(syntax-br-comments // backward t 52)
+
+(syntax-br-comments // forward t 53)
+(syntax-br-comments // backward t 53)
+(syntax-br-comments // forward t 54 58)
+(syntax-br-comments // backward t 54)
+(syntax-br-comments // forward t 55)
+(syntax-br-comments // backward t 55)
+
+(syntax-br-comments // forward t 56 56)
+(syntax-br-comments // backward t 58 54)
+(syntax-br-comments // backward nil 59)
+(syntax-br-comments // forward t 60)
+(syntax-br-comments // backward t 60)
+
+;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'.
+(syntax-pps-comments // 50 70 71)
+(syntax-pps-comments // 52 72 73)
+(syntax-pps-comments // 54 74 55 58)
+(syntax-pps-comments // 56 76 80)
+(syntax-pps-comments // 60 78 79)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Comments containing string delimiters.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c-\""))
+
+(syntax-comments /* forward t 120)
+(syntax-comments /* backward t 120)
+(syntax-comments /* forward t 121)
+(syntax-comments /* backward t 121)
+(syntax-comments /* forward t 122)
+(syntax-comments /* backward t 122)
+
+(syntax-comments /* backward nil 123 0)
+(syntax-comments /* forward t 124)
+(syntax-comments /* backward t 124)
+(syntax-comments /* backward nil 125 0)
+(syntax-comments /* forward t 126)
+(syntax-comments /* backward t 126)
+
+(syntax-comments /* forward t 127)
+(syntax-comments /* backward t 127)
+(syntax-comments /* forward t 128)
+(syntax-comments /* backward t 128)
+(syntax-comments /* forward t 129)
+(syntax-comments /* backward t 129)
+
+(syntax-comments /* forward t 130)
+(syntax-comments /* backward t 130)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The same again, with Emacs 28 style C comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c++-\""))
+
+(syntax-comments // forward t 120)
+(syntax-comments // backward t 120)
+(syntax-comments // forward t 121)
+(syntax-comments // backward t 121)
+(syntax-comments // forward t 122)
+(syntax-comments // backward t 122)
+
+(syntax-comments // backward nil 123 0)
+(syntax-comments // forward t 124)
+(syntax-comments // backward t 124)
+(syntax-comments // backward nil 125 0)
+(syntax-comments // forward t 126)
+(syntax-comments // backward t 126)
+
+(syntax-comments // forward t 127)
+(syntax-comments // backward t 127)
+(syntax-comments // forward t 128)
+(syntax-comments // backward t 128)
+(syntax-comments // forward t 129)
+(syntax-comments // backward t 129)
+
+(syntax-comments // forward t 130)
+(syntax-comments // backward t 130)
+
 ;;; syntax-tests.el ends here


-- 
Alan Mackenzie (Nuremberg, Germany).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]