grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.22-19-g5cb49d2


From: Paul Eggert
Subject: grep branch, master, updated. v2.22-19-g5cb49d2
Date: Thu, 07 Jan 2016 06:41:23 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  5cb49d2f375f0606ac9d916af6024d4b92ba0786 (commit)
      from  4f04b821d3bd00283cbfed17b81f68dba5fdd9cb (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=5cb49d2f375f0606ac9d916af6024d4b92ba0786


commit 5cb49d2f375f0606ac9d916af6024d4b92ba0786
Author: Paul Eggert <address@hidden>
Date:   Wed Jan 6 22:40:23 2016 -0800

    Improve on fix for Bug#22181
    
    * src/pcresearch.c (Pexecute): Update subject when skipping past
    easily-determined encoding errors, as this is faster than letting
    pcre_exec skip them.  On my platform this improves performance
    4.7x on a benchmark created via "yes $(printf '\200\200\200\200
    \200\200\200\200\200\200\200\200\200\200\200\200\200\200\200\200x\n')
    | head -n 1000000 >j; grep -oP y j" in a UTF-8 locale.  Rework
    code that deals with PCRE_ERROR_BADUTF8 return, to avoid an
    incorrect (albeit currently harmless) 'bol = false' assignment.

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 8f3d935..c0b8678 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -229,6 +229,7 @@ Pexecute (char *buf, size_t size, size_t *match_size,
           while (mbclen_cache[to_uchar (*p)] == (size_t) -1)
             {
               p++;
+              subject = p;
               bol = false;
             }
 
@@ -269,29 +270,30 @@ Pexecute (char *buf, size_t size, size_t *match_size,
             }
           int valid_bytes = sub[0];
 
-          /* Try to match the string before the encoding error.  */
-          if (valid_bytes < search_offset)
-            e = PCRE_ERROR_NOMATCH;
-          else if (valid_bytes == 0)
+          if (search_offset <= valid_bytes)
             {
-              /* Handle the empty-match case specially, for speed.
-                 This optimization is valid if VALID_BYTES is zero,
-                 which means SEARCH_OFFSET is also zero.  */
-              sub[1] = 0;
-              e = empty_match[bol];
-            }
-          else
-            e = jit_exec (subject, valid_bytes, search_offset,
-                          options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
+              /* Try to match the string before the encoding error.  */
+              if (valid_bytes == 0)
+                {
+                  /* Handle the empty-match case specially, for speed.
+                     This optimization is valid if VALID_BYTES is zero,
+                     which means SEARCH_OFFSET is also zero.  */
+                  sub[1] = 0;
+                  e = empty_match[bol];
+                }
+              else
+                e = jit_exec (subject, valid_bytes, search_offset,
+                              options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
 
-          if (e != PCRE_ERROR_NOMATCH)
-            break;
+              if (e != PCRE_ERROR_NOMATCH)
+                break;
+
+              /* Treat the encoding error as data that cannot match.  */
+              p = subject + valid_bytes + 1;
+              bol = false;
+            }
 
-          /* Treat the encoding error as data that cannot match.  */
           subject += valid_bytes + 1;
-          if (p < subject)
-            p = subject;
-          bol = false;
         }
 
       if (e != PCRE_ERROR_NOMATCH)

-----------------------------------------------------------------------

Summary of changes:
 src/pcresearch.c |   40 +++++++++++++++++++++-------------------
 1 files changed, 21 insertions(+), 19 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]