bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8


From: Jim Meyering
Subject: Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8
Date: Tue, 12 Jun 2012 18:27:58 +0200

Paul Eggert wrote:
> Follow-up Comment #3, bug #36567 (project grep):
> I'm attaching a further fix, to fix a problem reported today about the
> installed patch.

Thanks, Paul.

That newer patch looks fine.
I've included a copy below for reference.

But please move the declaration of ombclen down to its initialization.
The only part of grep for which we avoid decl-after-stmt is dfa.c.

Did you try to create a test case that triggers this bug?
It would be nice to do that, or if that is not possible,
to say that no such case exists (at least with UTF-8).

>From 3c68148e7198476dc82804b550fc51408a806d28 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Tue, 12 Jun 2012 08:46:18 -0700
Subject: [PATCH] grep: handle -i when chars differ in length but line does
 not

* src/searchutils.c (mbtolower): Return the map back to the caller
if any input character's length differs from the corresponding output
character's, not merely if the total string length differs.
Problem reported by Johannes Mercer in
<http://lists.gnu.org/archive/html/bug-grep/2012-06/msg00029.html>.
---
 src/searchutils.c |   17 ++++++++++-------
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/searchutils.c b/src/searchutils.c
index 4942c51..1e2cb35 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -55,7 +55,8 @@ kwsinit (kwset_t *kwset)
    to the buffer and reuses it on any subsequent call.  As a consequence,
    this function is not thread-safe.

-   When the lowercase result string has the same length as the input string,
+   When all the characters in the lowercase result string have the
+   same length as corresponding characters in the input string,
    set *LEN_MAP_P to NULL.  Otherwise, set it to a malloc'd buffer (like the
    returned buffer, this must not be freed by caller) of the same length as
    the result string.  (*LEN_MAP_P)[J] is one less than the length-in-bytes
@@ -74,6 +75,7 @@ mbtolower (const char *beg, size_t *n, unsigned char 
**len_map_p)
   const char *end;
   char *p;
   unsigned char *m;
+  bool lengths_differ = false;

   if (*n > outalloc || outalloc == 0)
     {
@@ -99,7 +101,8 @@ mbtolower (const char *beg, size_t *n, unsigned char 
**len_map_p)
   while (beg < end)
     {
       wchar_t wc;
-      size_t mbclen = mbrtowc(&wc, beg, end - beg, &is);
+      size_t mbclen = mbrtowc (&wc, beg, end - beg, &is);
+      size_t ombclen;
       if (outlen + mb_cur_max >= outalloc)
         {
           size_t dm = m - len_map;
@@ -123,14 +126,14 @@ mbtolower (const char *beg, size_t *n, unsigned char 
**len_map_p)
         {
           *m++ = mbclen - 1;
           beg += mbclen;
-          mbclen = wcrtomb (p, towlower ((wint_t) wc), &os);
-          p += mbclen;
-          outlen += mbclen;
+          ombclen = wcrtomb (p, towlower ((wint_t) wc), &os);
+          p += ombclen;
+          outlen += ombclen;
+          lengths_differ |= (mbclen != ombclen);
         }
     }

-  /* If the new length differs from the original, give caller the map.  */
-  *len_map_p = p - out == *n ? NULL : len_map;
+  *len_map_p = lengths_differ ? len_map : NULL;
   *n = p - out;
   *p = 0;
   return out;
--
1.7.6.5



reply via email to

[Prev in Thread] Current Thread [Next in Thread]