grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.22-20-gd1160ec


From: Paul Eggert
Subject: grep branch, master, updated. v2.22-20-gd1160ec
Date: Fri, 08 Jan 2016 05:30:34 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  d1160ec6d239b2e0f20c2fb3395e3b70963bf916 (commit)
      from  5cb49d2f375f0606ac9d916af6024d4b92ba0786 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d1160ec6d239b2e0f20c2fb3395e3b70963bf916


commit d1160ec6d239b2e0f20c2fb3395e3b70963bf916
Author: Paul Eggert <address@hidden>
Date:   Thu Jan 7 21:28:23 2016 -0800

    grep: improve unibyte -P performance
    
    This is a followon to the recent changes prompted by Bug#20526.
    In <http://bugs.gnu.org/bug=20526#86> Norihiro Tanaka pointed out
    that grep mistakenly assumed that unibyte locales cannot have
    encoding errors.  Here, the mistake hurt performance significantly.
    On Fedora 23 x86-64 in the C locale, this patch improved grep's
    performance by a factor of 7 when run as "grep -P 'z.*a'" on the
    output of "yes $(printf '\200\n') | head -n 1000000000".
    * src/pcresearch.c (multibyte_locale) [HAVE_LIBPCRE]: New static var.
    (Pcompile): Set it.
    (Pexecute): Use it to avoid the need to call
    buf_has_encoding_errors in unibyte locales.

diff --git a/src/pcresearch.c b/src/pcresearch.c
index c0b8678..1fae94d 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -84,6 +84,8 @@ jit_exec (char const *subject, int search_bytes, int 
search_offset,
 /* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
    string matches when that flag is used.  */
 static int empty_match[2];
+
+static bool multibyte_locale;
 #endif
 
 void
@@ -104,10 +106,14 @@ Pcompile (char const *pattern, size_t size)
   char const *p;
   char const *pnul;
 
-  if (using_utf8 ())
-    flags |= PCRE_UTF8;
-  else if (MB_CUR_MAX != 1)
-    error (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
+  if (1 < MB_CUR_MAX)
+    {
+      if (! using_utf8 ())
+        error (EXIT_TROUBLE, 0,
+               _("-P supports only unibyte and UTF-8 locales"));
+      multibyte_locale = true;
+      flags |= PCRE_UTF8;
+    }
 
   /* FIXME: Remove these restrictions.  */
   if (memchr (pattern, '\n', size))
@@ -194,12 +200,16 @@ Pexecute (char *buf, size_t size, size_t *match_size,
      error.  */
   char const *subject = buf;
 
-  /* If the input is free of encoding errors a multiline search is
+  /* If the input is unibyte or is free of encoding errors a multiline search 
is
      typically more efficient.  Otherwise, a single-line search is
      typically faster, so that pcre_exec doesn't waste time validating
      the entire input buffer.  */
-  bool multiline = ! buf_has_encoding_errors (buf, size - 1);
-  buf[size - 1] = eolbyte;
+  bool multiline = true;
+  if (multibyte_locale)
+    {
+      multiline = ! buf_has_encoding_errors (buf, size - 1);
+      buf[size - 1] = eolbyte;
+    }
 
   for (; p < buf + size; p = line_start = line_end + 1)
     {

-----------------------------------------------------------------------

Summary of changes:
 src/pcresearch.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]