bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/2] dfa: process range expressions consistently with system rege


From: Paolo Bonzini
Subject: [PATCH 1/2] dfa: process range expressions consistently with system regex
Date: Tue, 21 Sep 2010 17:58:57 +0200

The actual meaning of range expressions in glibc is not exactly strcoll,
which makes the behavior of grep hard to predict when compiled with the
system regex.  Leave to the system regex matcher the decision of which
single-byte characters are matched by a range expression.

This partially reverts a change made in commit 0d38a8bb (which made
sense at the time, but not now that src/dfa.c is not doing multibyte
character set matching anymore).

* src/dfa.c (in_coll_range): Use system regex to find which single-char
bytes match a range expression.
---
 NEWS      |    6 ++++++
 src/dfa.c |   27 ++++++++++++++++-----------
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/NEWS b/NEWS
index 01bbd21..539e978 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU grep NEWS                                    -*- outline 
-*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  grep's interpretation of range expression is now more consistent with
+  that of other tools.  [bug present since multi-byte character set
+  support was introduced in 2.5.2, though the steps needed to reproduce
+  it changed in grep-2.6]
 
 * Noteworthy changes in release 2.7 (2010-09-16) [stable]
 
diff --git a/src/dfa.c b/src/dfa.c
index a2f4174..f3e066f 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -697,13 +697,6 @@ static unsigned char const *buf_end;       /* reference to 
end in dfaexec().  */
 
 #endif /* MBS_SUPPORT */
 
-static int
-in_coll_range (char ch, char from, char to)
-{
-  char c[6] = { from, 0, ch, 0, to, 0 };
-  return strcoll (&c[0], &c[2]) <= 0 && strcoll (&c[2], &c[4]) <= 0;
-}
-
 typedef int predicate (int);
 
 /* The following list maps the names of the Posix named character classes
@@ -979,10 +972,22 @@ parse_bracket_exp (void)
                 for (c = c1; c <= c2; c++)
                   setbit_case_fold (c, ccl);
               else
-                for (c = 0; c < NOTCHAR; ++c)
-                  if (!(case_fold && isupper (c))
-                      && in_coll_range (c, c1, c2))
-                    setbit_case_fold (c, ccl);
+                {
+                  /* Defer to the system regex library about the meaning
+                     of range expressions.  */
+                  regex_t re;
+                  char pattern[6] = { '[', c1, '-', c2, ']', 0 };
+                  char subject[2] = { 0, 0 };
+                  regcomp (&re, pattern, REG_NOSUB);
+                  for (c = 0; c < NOTCHAR; ++c)
+                    {
+                      subject[0] = c;
+                      if (!(case_fold && isupper (c))
+                          && regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
+                        setbit_case_fold (c, ccl);
+                    }
+                  regfree (&re);
+                }
             }
 
           colon_warning_state |= 8;
-- 
1.7.2.3





reply via email to

[Prev in Thread] Current Thread [Next in Thread]