bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch #7134] Patch for is_mb_middle in searchutil.c


From: Jim Meyering
Subject: Re: [patch #7134] Patch for is_mb_middle in searchutil.c
Date: Sun, 28 Mar 2010 18:17:06 +0200

Jim Meyering wrote:
> Norihirio Tanaka wrote:
>> URL:
>>   <http://savannah.gnu.org/patch/?7134>
>>
>>                  Summary: Patch for is_mb_middle in searchutil.c
>>                  Project: grep
>>             Submitted by: noritnk
>>             Submitted on: 2010年03月26日 13時00分00秒
>>                 Category: None
>>                 Priority: 5 - Normal
>>                   Status: None
>>                  Privacy: Public
>>              Assigned to: None
>>         Originator Email:
>>              Open/Closed: Open
>>          Discussion Lock: Any
>>
>> Details:
>>
>> I seem is_mb_middle has two bugs.
>> This patch (merged) will be corrected both their bugs.
>>
>> 1. fgrep (2.6.1) hangs on a pattern with invalid sequence.
>>
>> ==> See attachment `fgrep-hang'.
>>
>> 2. A complete multibyte string matches with incomplate
>> multibyte pattern with truncated octet character in fgrep (2.6.1).
>>
>> ==> See attachment `truncated-character.
>
> Thank you for the patch and test cases!
>
> I've adjusted your test scripts slightly, hooked them
> up via tests/Makefile.am and added most of the commit logs.
> I confirm that each test exercises a bug that is fixed by the patch.
> I'll study your patch over the weekend.
> Hmm... or maybe not.  Continue reading...

I could not risk studying your patch (don't want to have to wait for
the copyright assignment process), so wrote my own.  Since I've had to
modify the test case (with your patch, grep printed the line, with mine
it does not), I gather that our solutions are fundamentally different.
In addition, I've changed the test to timeout after 10 seconds when run
against a version of grep with the flaw.


>From 4767f2f87399987afed8787de5cd4973358f76cd Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sun, 28 Mar 2010 15:33:10 +0200
Subject: [PATCH 1/2] grep -F: avoid infinite loop when searching for incomplete 
MB character

Searching for an incomplete non-prefix of a multi-byte character
should find no match.

  Just as these print nothing,
    printf '\357\274\241\357\274\241\n' \
      | perl -CIO -ne '/\241\357/ and print'
    printf '\357\274\241\n' | perl -CIO -ne '/\274\241/ and print'
    printf '\357\274\241\n' | perl -CIO -ne '/\241/ and print'
    printf '\357\274\241\n' | perl -CIO -ne '/\274/ and print'

  These should also print nothing, but with grep-2.6 and grep-2.6.1,
  they would infloop:
    printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\241'
    printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274'
    printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274\241'

* src/kwsearch.c (Fexecute): Don't infloop when searching for
an incomplete non-prefix part of a multi-byte character.
* NEWS (Bug fixes): Mention it.
Reported and diagnosed by Norihiro Tanaka.
---
 NEWS           |    3 +++
 src/kwsearch.c |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index 6c9943e..91a0b42 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ GNU grep NEWS                                    -*- outline -*-

 ** Bug fixes

+  grep -F no longer goes into an infinite loop when it finds a match
+  for an incomplete (non-prefix of a) multibyte character
+
   The --include option works once again.  [bug introduced in 2.6]

   Using any of the --include or --exclude* options would cause a NULL
diff --git a/src/kwsearch.c b/src/kwsearch.c
index fa801e6..85a28bc 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -104,8 +104,11 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
       if (offset == (size_t) -1)
        goto failure;
 #ifdef MBS_SUPPORT
+      char const *s0 = mb_start;
       if (MB_CUR_MAX > 1 && is_mb_middle (&mb_start, beg + offset, buf + size))
         {
+         if (mb_start == s0)
+           goto failure;
           beg = mb_start - 1;
           continue; /* It is a part of multibyte character.  */
         }
--
1.7.0.3.448.g82eeb


>From f68aa3d7d9d74407d02a1f3552903a0495910015 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <address@hidden>
Date: Sun, 28 Mar 2010 17:56:22 +0200
Subject: [PATCH 2/2] tests: add tests for the fgrep-infloop bug

* tests/init.cfg (require_timeout_): New function.
* tests/fgrep-infloop: New file.  Test for the above fix.
* tests/Makefile.am (TESTS): Add it.
---
 NEWS                |    4 ++--
 tests/Makefile.am   |    1 +
 tests/fgrep-infloop |   20 ++++++++++++++++++++
 tests/init.cfg      |    6 ++++++
 4 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 tests/fgrep-infloop

diff --git a/NEWS b/NEWS
index 91a0b42..2e8ef15 100644
--- a/NEWS
+++ b/NEWS
@@ -4,8 +4,8 @@ GNU grep NEWS                                    -*- outline -*-

 ** Bug fixes

-  grep -F no longer goes into an infinite loop when it finds a match
-  for an incomplete (non-prefix of a) multibyte character
+  grep -F no longer goes into an infinite loop when it finds a match for an
+  incomplete (non-prefix of a) multibyte character.  [bug introduced in 2.6]

   The --include option works once again.  [bug introduced in 2.6]

diff --git a/tests/Makefile.am b/tests/Makefile.am
index f9ba21b..9e0224f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -29,6 +29,7 @@ TESTS =                                               \
   ere.sh                                       \
   euc-mb                                       \
   fedora                                       \
+  fgrep-infloop                                        \
   file.sh                                      \
   fmbtest                                      \
   foad1                                                \
diff --git a/tests/fgrep-infloop b/tests/fgrep-infloop
new file mode 100644
index 0000000..159baca
--- /dev/null
+++ b/tests/fgrep-infloop
@@ -0,0 +1,20 @@
+#!/bin/sh
+# This would infloop for grep-2.6.1
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+require_timeout_
+
+encode() { echo "$1" | tr ABC '\357\274\241'; }
+
+fail=0
+
+for LOC in en_US.UTF-8 $LOCALE_FR_UTF8; do
+  out=out1-$LOC
+  encode ABC \
+    | LC_ALL=$LOC timeout 10s grep -F "$(encode BC)" > $out 2>&1
+  test $? = 1 || fail=1
+  compare $out /dev/null || fail=1
+done
+
+Exit $fail
diff --git a/tests/init.cfg b/tests/init.cfg
index 0ec60f1..6f957b3 100644
--- a/tests/init.cfg
+++ b/tests/init.cfg
@@ -32,3 +32,9 @@ do
 done

 test "$envvar_check_fail" = 1 && fail_ "failed to unset the above envvars"
+
+require_timeout_()
+{
+  ( timeout --version ) > /dev/null 2>&1 \
+    || skip_ your system lacks the timeout program
+}
--
1.7.0.3.448.g82eeb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]