bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The bug in the Grep Command


From: Jim Meyering
Subject: Re: The bug in the Grep Command
Date: Mon, 06 Aug 2012 13:34:37 +0200

address@hidden wrote:
> The grep command in the current version of Debian, the 6th, with two
> options -i and -n the same will not work well when applied to an empty
> line there.
>
> For example:
>
> grep -in "^$" < /etc/inittab
>
> or
> grep -in "^$" < /etc/X11/xorg.conf

Wow!  Thank you for the report.

At first I didn't see a problem with the latest (2.13):

    $ printf 'a\n\nb\n' |grep -in '^$'
    2:

or with debian unstable's grep-2.12.

Then I remembered that I usually use the C locale...
which means I'm running the equivalent of this:

    $ printf 'a\n\nb\n' | LC_ALL=C grep -in '^$'

If I run in a UTF-8 locale like what you probably have, like this:

    $ printf 'a\n\nb\n' | LC_ALL=en_US.utf8 grep -in '^$'

I am flabbergasted to see this erroneous output:

    2:3:

Worse still, the following command prints 0 when there is no match(!):

    $ seq 2|LC_ALL=en_US.utf8 grep -i '^$'; echo $?
    0
    $

With -n, and a larger input, the problem is a little clearer:

    $ seq 9|LC_ALL=en_US.utf8 grep -in '^$'
    2:4:6:8:10:12:14:16:$

I've just fixed the bug (patch below).
Now it does this:

    $ seq 9|LC_ALL=en_US.utf8 src/grep -i '^$'; echo $?
    1
    $

=========================================================
Here's a preliminary patch.
The test name, "ni", is going to change
especially now that I know the bug is independent
of the use of "-n".  Also, I will add something
like the examples above.

>From b7850c794ee0174774567f55d3d7ef61cd9d1445 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sun, 5 Aug 2012 23:22:28 +0200
Subject: [PATCH 1/2] tests: test for bug with -n and -i in a multi-byte
 locale

* tests/ni: New file.
* tests/Makefile.am (TESTS): Add it.
Reported by address@hidden
---
 tests/Makefile.am |  1 +
 tests/ni          | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100755 tests/ni

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 7d95862..cbd69ee 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -69,6 +69,7 @@ TESTS =                                               \
   inconsistent-range                            \
   khadafy                                      \
   max-count-vs-context                         \
+  ni                                           \
   unibyte-bracket-expr                         \
   high-bit-range                               \
   options                                      \
diff --git a/tests/ni b/tests/ni
new file mode 100755
index 0000000..0e78655
--- /dev/null
+++ b/tests/ni
@@ -0,0 +1,23 @@
+#! /bin/sh
+# Test using -n with -i in a multibyte locale.
+#
+# Copyright (C) 2012 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+
+LC_ALL=en_US.UTF-8
+export LC_ALL
+
+printf 'a\n\nb\n' > in || framework_failure_
+printf '2:\n' > exp || framework_failure_
+
+grep -n -i '^$' in > out || fail=1
+compare exp out || fail=1
+
+Exit $fail
--
1.7.12.rc1.22.gbfbf4d4


>From cbc79980d39e4db04974b1182d2670fea8b10016 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Mon, 6 Aug 2012 13:29:51 +0200
Subject: [PATCH 2/2] grep -i '^$' in a multi-byte locale could report a false
 match

* src/dfasearch.c (EGexecute): Do not match the sentinel "newline"
that is appended to each buffer.
* NEWS (Bug fixes): Mention it.

tests: test for bug with -i in a multi-byte locale
---
 NEWS            | 7 +++++++
 src/dfasearch.c | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index fdba25e..72a90e7 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,13 @@ GNU grep NEWS                                    -*- outline 
-*-

 ** Bug fixes

+  grep -i '^$' could exit 0 (i.e., report a match) in a multi-byte locale,
+  even though there was no match, and the command generated not output.
+  E.g., printf 'a\nb\n'|LC_ALL=en_US.utf8 grep -il '^$' would mistakenly
+  print "(standard input)".  Related, seq 9|LC_ALL=en_US.utf8 grep -in '^$'
+  would print "2:4:6:8:10:12:14:16" with no trailing newline.
+  [bug introduced in grep-2.6]
+
   'grep' no longer falsely reports text files as being binary on file
   systems that compress contents or that store tiny contents in metadata.

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 1121176..29c096a 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -277,7 +277,9 @@ EGexecute (char const *buf, size_t size, size_t *match_size,
               /* No good fixed strings; start with DFA. */
               char const *next_beg = dfaexec (dfa, beg, (char *) buflim,
                                               0, NULL, &backref);
-              if (next_beg == NULL)
+              /* If there's no match, of if we've matched the sentinel,
+                 we're done.  */
+              if (next_beg == NULL || next_beg == buflim)
                 break;
               /* Narrow down to the line we've found. */
               beg = next_beg;
--
1.7.12.rc1.22.gbfbf4d4



reply via email to

[Prev in Thread] Current Thread [Next in Thread]