[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
m4 regex usage [was: Multi-Line Definitions]
From: |
Eric Blake |
Subject: |
m4 regex usage [was: Multi-Line Definitions] |
Date: |
Sat, 29 Sep 2007 18:01:20 -0600 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[adding m4-patches; this branch of the thread can drop other lists]
According to Eric Blake on 9/29/2007 1:31 PM:
> Here's something a bit more telling. With the attached patch, and in the
> coreutils directory,
>
> $ M4_TRACE_FILE=~/m4.trace M4=~/m4/src/m4 autoconf
I tweaked my tracer patch a bit to distinguish between patsubst and regexp.
$ sort <m4.trace | uniq -c |sort -n -k1,1 |tail -n 15
...
1214 p:
1596
...
So half of the empty lines in my trace actually did a multi-line regex.
But 1214 of them did a patsubst(string, [], []), and m4 wasted time
compiling the empty regex every one of those times! Applying this to m4,
to add some benefit to autoconf < 2.62 vs. m4 > 1.4.10 (watch for a
followup to autoconf that avoids the empty regex to begin with).
2007-09-29 Eric Blake <address@hidden>
Optimize for Autoconf usage pattern.
* src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
faster.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG/udP84KuGfSFAYARAmHzAJwJO8+zwXssS/qlIEfotONpp/epRgCfQgQ3
Rjq/NWvO4ha9S+o3gpv9gdg=
=BSOA
-----END PGP SIGNATURE-----
>From aa46ced67010190918295b965f5e2879dcd9a30c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 29 Sep 2007 17:48:29 -0600
Subject: [PATCH] Optimize for Autoconf usage pattern.
* src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
faster.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 6 ++++++
src/builtin.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 0cea5b5..f29b557 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2007-09-29 Eric Blake <address@hidden>
+
+ Optimize for Autoconf usage pattern.
+ * src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
+ faster.
+
2007-09-24 Eric Blake <address@hidden>
Create .gitignore alongside .cvsignore.
diff --git a/src/builtin.c b/src/builtin.c
index dee2276..65f4585 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1968,8 +1968,19 @@ m4_regexp (struct obstack *obs, int argc, token_data
**argv)
return;
}
- victim = TOKEN_DATA_TEXT (argv[1]);
- regexp = TOKEN_DATA_TEXT (argv[2]);
+ victim = ARG (1);
+ regexp = ARG (2);
+ repl = ARG (3);
+
+ if (!*regexp)
+ {
+ /* The empty regex matches everything! */
+ if (argc == 3)
+ shipout_int (obs, 0);
+ else
+ obstack_grow (obs, repl, strlen (repl));
+ return;
+ }
init_pattern_buffer (&buf, ®s);
msg = re_compile_pattern (regexp, strlen (regexp), &buf);
@@ -1993,10 +2004,7 @@ m4_regexp (struct obstack *obs, int argc, token_data
**argv)
else if (argc == 3)
shipout_int (obs, startpos);
else if (startpos >= 0)
- {
- repl = TOKEN_DATA_TEXT (argv[3]);
- substitute (obs, victim, repl, ®s);
- }
+ substitute (obs, victim, repl, ®s);
free_pattern_buffer (&buf, ®s);
}
@@ -2013,6 +2021,7 @@ m4_patsubst (struct obstack *obs, int argc, token_data
**argv)
{
const char *victim; /* first argument */
const char *regexp; /* regular expression */
+ const char *repl;
struct re_pattern_buffer buf; /* compiled regular expression */
struct re_registers regs; /* for subexpression matches */
@@ -2029,7 +2038,17 @@ m4_patsubst (struct obstack *obs, int argc, token_data
**argv)
return;
}
- regexp = TOKEN_DATA_TEXT (argv[2]);
+ victim = ARG (1);
+ regexp = ARG (2);
+ repl = ARG (3);
+
+ /* The empty regex matches everywhere, but if there is no
+ replacement, we need not waste time with it. */
+ if (!*regexp && !*repl)
+ {
+ obstack_grow (obs, victim, strlen (victim));
+ return;
+ }
init_pattern_buffer (&buf, ®s);
msg = re_compile_pattern (regexp, strlen (regexp), &buf);
@@ -2042,7 +2061,6 @@ m4_patsubst (struct obstack *obs, int argc, token_data
**argv)
return;
}
- victim = TOKEN_DATA_TEXT (argv[1]);
length = strlen (victim);
offset = 0;
@@ -2073,7 +2091,7 @@ m4_patsubst (struct obstack *obs, int argc, token_data
**argv)
/* Handle the part of the string that was covered by the match. */
- substitute (obs, victim, ARG (3), ®s);
+ substitute (obs, victim, repl, ®s);
/* Update the offset to the end of the match. If the regexp
matched a null string, advance offset one more, to avoid
--
1.5.3.2
- Re: Multi-Line Definitions, Ralf Wildenhues, 2007/09/18
- RE: Multi-Line Definitions, Eric Lemings, 2007/09/18
- Re: Multi-Line Definitions, Eric Blake, 2007/09/22
- Re: Multi-Line Definitions, Eric Blake-1, 2007/09/27
- Re: Multi-Line Definitions, Ralf Wildenhues, 2007/09/29
- Re: Multi-Line Definitions, Eric Blake, 2007/09/29
- Re: Multi-Line Definitions, Eric Blake, 2007/09/29
- m4 regex usage [was: Multi-Line Definitions],
Eric Blake <=
- m4sugar speedups [was: Multi-Line Definitions], Eric Blake, 2007/09/29
- Re: m4sugar speedups [was: Multi-Line Definitions], Benoit SIGOURE, 2007/09/30
- Re: m4sugar speedups [was: Multi-Line Definitions], Eric Blake, 2007/09/30