grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 6/6] grep: scan back thru UTF-8 a bit faster


From: Paul Eggert
Subject: [PATCH 6/6] grep: scan back thru UTF-8 a bit faster
Date: Tue, 24 Aug 2021 00:45:41 -0700

* src/searchutils.c (mb_goback): When scanning backward through
UTF-8, check the length implied by the putative byte 1 before
bothering to invoke mb_clen.  This length check also lets us use
mbrlen directly rather than calling mb_clen, which would
eventually defer to mbrlen anyway.
---
 src/searchutils.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }
-- 
2.31.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]