m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

token recognition order


From: Eric Blake
Subject: token recognition order
Date: Tue, 17 Feb 2009 07:12:25 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.19) Gecko/20081209 Thunderbird/2.0.0.19 Mnenhy/0.7.6.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The 1.4.x manual states that GNU m4 recognizes comments differently than
other m4 implementations, and that things would change in the future.  The
master branch has already made the change; I'm now porting it to
branch-1.6 as well:

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkmaxcgACgkQ84KuGfSFAYCVRACdGmGc8k4gvZQHkOnSwQ6hJff1
iEAAmLuK3ZFndwrP+FHHK1tWJzhnh1k=
=khVV
-----END PGP SIGNATURE-----
>From 16e712b9dbcfcc49a54dd7c010ca1cab075fd79a Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 17 Feb 2009 07:08:55 -0700
Subject: [PATCH] Reorder token recognition to match other implementations.

* src/input.c (next_token): Recognize comments after quotes, but
before macro arguments.
* doc/m4.texinfo (Changecom): Document this.
* NEWS: Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |    8 +++
 NEWS           |    4 ++
 doc/m4.texinfo |   38 +++++++++++---
 src/input.c    |  158 ++++++++++++++++++++++++++++----------------------------
 4 files changed, 121 insertions(+), 87 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 88a3723..85f2c5b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2009-02-17  Eric Blake  <address@hidden>
+
+       Reorder token recognition to match other implementations.
+       * src/input.c (next_token): Recognize comments after quotes, but
+       before macro arguments.
+       * doc/m4.texinfo (Changecom): Document this.
+       * NEWS: Likewise.
+
 2009-02-16  Eric Blake  <address@hidden>

        Stage 29: Process input by buffer, not bytes.
diff --git a/NEWS b/NEWS
index 69c0bb8..d4839f5 100644
--- a/NEWS
+++ b/NEWS
@@ -50,6 +50,10 @@ Software Foundation, Inc.
    then apply this patch:
      http://git.sv.gnu.org/gitweb/?p=autoconf.git;a=commitdiff;h=56d42fa71

+** The `changecom' builtin semantics now match traditional
+   implementations; if the start-comment string resembles a macro name or
+   the start-quote string, comments are effectively disabled.
+
 ** The `divert' builtin now accepts an optional second argument of text
    that is immediately placed in the new diversion, regardless of whether
    the current expansion is nested within argument collection of another
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 10fa4d2..3da0443 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -4932,13 +4932,15 @@ Changecom
 of any length.  Other implementations cap the delimiter length to five
 characters, but @acronym{GNU} has no inherent limit.

-Comments are recognized in preference to macros.  However, this is not
-compatible with other implementations, where macros and even quoting
-takes precedence over comments, so it may change in a future release.
-For portability, this means that @var{start} should not begin with a
-letter, digit, or @samp{_} (underscore), and that neither the
-start-quote nor the start-comment string should be a prefix of the
-other.
+As of M4 1.6, macros and quotes are recognized in preference to
+comments, so if a prefix of @var{start} can be recognized as part of a
+potential macro name, or confused with a quoted string, the comment
+mechanism is effectively disabled (earlier versions of @acronym{GNU} M4
+favored comments, but this was inconsistent with other implementations).
+Unless you use @code{changeword} (@pxref{Changeword}), this means
+that @var{start} should not begin with a letter, digit, or @samp{_}
+(underscore), and that neither the start-quote nor the start-comment
+string should be a prefix of the other.

 @example
 define(`hi', `HI')
@@ -4948,13 +4950,33 @@ Changecom
 changecom(`q', `Q')
 @result{}
 q hi Q hi
address@hidden hi Q HI
address@hidden HI Q HI
 changecom(`1', `2')
 @result{}
 hi1hi2
 @result{}hello
 hi 1hi2
 @result{}HI 1hi2
+changecom(`[[', `]]')
address@hidden
+changequote(`[[[', `]]]')
address@hidden
+[hi]
address@hidden
+[[hi]]
address@hidden
+[[[hi]]]
address@hidden
+changequote
address@hidden
+changecom(`[[[', `]]]')
address@hidden
+changequote(`[[', `]]')
address@hidden
+[[hi]]
address@hidden
+[[[hi]]]
address@hidden
 @end example

 Comments are recognized in preference to argument collection.  In
diff --git a/src/input.c b/src/input.c
index 2acbd70..709ef3e 100644
--- a/src/input.c
+++ b/src/input.c
@@ -1864,64 +1864,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
       return TOKEN_ARGV;
     }

-  if (MATCH (ch, curr_comm.str1, curr_comm.len1, true))
-    {
-      if (obs)
-       obs_td = obs;
-      obstack_grow (obs_td, curr_comm.str1, curr_comm.len1);
-      while (1)
-       {
-         /* Start with buffer search for potential end delimiter.  */
-         size_t len;
-         const char *buffer = next_buffer (&len, false);
-         if (buffer)
-           {
-             const char *p = (char *) memchr (buffer, *curr_comm.str2, len);
-             if (p)
-               {
-                 obstack_grow (obs_td, buffer, p - buffer);
-                 ch = to_uchar (*p);
-                 consume_buffer (p - buffer + 1);
-               }
-             else
-               {
-                 obstack_grow (obs_td, buffer, len);
-                 consume_buffer (len);
-                 continue;
-               }
-           }
-
-         /* Fall back to byte-wise search.  */
-         else
-           ch = next_char (false, false);
-         if (ch == CHAR_EOF)
-           {
-             /* Current_file changed to "" if we see CHAR_EOF, use
-                the previous value we stored earlier.  */
-             if (!caller)
-               {
-                 assert (line);
-                 current_line = *line;
-                 current_file = file;
-               }
-             m4_error (EXIT_FAILURE, 0, caller, _("end of file in comment"));
-           }
-         if (ch == CHAR_MACRO)
-           {
-             init_macro_token (obs, obs ? td : NULL);
-             continue;
-           }
-         if (MATCH (ch, curr_comm.str2, curr_comm.len2, true))
-           {
-             obstack_grow (obs_td, curr_comm.str2, curr_comm.len2);
-             break;
-           }
-         assert (ch < CHAR_EOF);
-         obstack_1grow (obs_td, ch);
-       }
-      type = TOKEN_COMMENT;
-    }
-  else if (default_word_regexp && (isalpha (ch) || ch == '_'))
+  if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
       obstack_1grow (&token_stack, ch);
       while (1)
@@ -1996,27 +1939,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,

 #endif /* ENABLE_CHANGEWORD */

-  else if (!MATCH (ch, curr_quote.str1, curr_quote.len1, true))
-    {
-      assert (ch < CHAR_EOF);
-      switch (ch)
-       {
-       case '(':
-         type = TOKEN_OPEN;
-         break;
-       case ',':
-         type = TOKEN_COMMA;
-         break;
-       case ')':
-         type = TOKEN_CLOSE;
-         break;
-       default:
-         type = TOKEN_SIMPLE;
-         break;
-       }
-      obstack_1grow (&token_stack, ch);
-    }
-  else
+  else if (MATCH (ch, curr_quote.str1, curr_quote.len1, true))
     {
       if (obs)
        obs_td = obs;
@@ -2096,6 +2019,83 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
            }
        }
     }
+  else if (MATCH (ch, curr_comm.str1, curr_comm.len1, true))
+    {
+      if (obs)
+       obs_td = obs;
+      obstack_grow (obs_td, curr_comm.str1, curr_comm.len1);
+      while (1)
+       {
+         /* Start with buffer search for potential end delimiter.  */
+         size_t len;
+         const char *buffer = next_buffer (&len, false);
+         if (buffer)
+           {
+             const char *p = (char *) memchr (buffer, *curr_comm.str2, len);
+             if (p)
+               {
+                 obstack_grow (obs_td, buffer, p - buffer);
+                 ch = to_uchar (*p);
+                 consume_buffer (p - buffer + 1);
+               }
+             else
+               {
+                 obstack_grow (obs_td, buffer, len);
+                 consume_buffer (len);
+                 continue;
+               }
+           }
+
+         /* Fall back to byte-wise search.  */
+         else
+           ch = next_char (false, false);
+         if (ch == CHAR_EOF)
+           {
+             /* Current_file changed to "" if we see CHAR_EOF, use
+                the previous value we stored earlier.  */
+             if (!caller)
+               {
+                 assert (line);
+                 current_line = *line;
+                 current_file = file;
+               }
+             m4_error (EXIT_FAILURE, 0, caller, _("end of file in comment"));
+           }
+         if (ch == CHAR_MACRO)
+           {
+             init_macro_token (obs, obs ? td : NULL);
+             continue;
+           }
+         if (MATCH (ch, curr_comm.str2, curr_comm.len2, true))
+           {
+             obstack_grow (obs_td, curr_comm.str2, curr_comm.len2);
+             break;
+           }
+         assert (ch < CHAR_EOF);
+         obstack_1grow (obs_td, ch);
+       }
+      type = TOKEN_COMMENT;
+    }
+  else
+    {
+      assert (ch < CHAR_EOF);
+      switch (ch)
+       {
+       case '(':
+         type = TOKEN_OPEN;
+         break;
+       case ',':
+         type = TOKEN_COMMA;
+         break;
+       case ')':
+         type = TOKEN_CLOSE;
+         break;
+       default:
+         type = TOKEN_SIMPLE;
+         break;
+       }
+      obstack_1grow (&token_stack, ch);
+    }

   if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
     {
-- 
1.6.1.2


reply via email to

[Prev in Thread] Current Thread [Next in Thread]