[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.16-18-g5133396

From: Jim Meyering
Subject: grep branch, master, updated. v2.16-18-g5133396
Date: Tue, 18 Feb 2014 01:36:39 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  5133396666c4e9bbe30cc1510e3ec703452d6a8b (commit)
      from  825ab8547c54917fef77dc875ab070de2b8cb053 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------

commit 5133396666c4e9bbe30cc1510e3ec703452d6a8b
Author: Paolo Bonzini <address@hidden>
Date:   Mon Feb 17 15:18:10 2014 +0100

    revert "grep: DFA now uses rational ranges in unibyte locales"
    The correct course of action for grep is to defer range interpretation
    to regex, because otherwise you can get mismatches between regexes with
    backreferences and those without.
    For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing
    result that the first regex won't match a superset of the language
    described by the second regex.
    The source of the confusion is that, even though grep's dfa.c was changed
    to use range checking instead of strcoll, that code is only invoked if
    dfaexec is called with backref = NULL, and that never happens for grep!
    In the end, all that's needed for RRI is compiling --with-included-regex,
    and in that case the patch is almost a no-op.  Almost, because there
    are corner cases that aren't handled correctly (e.g. [a-[.e.]], or
    regular expressions that include a NUL character), but this can be
    handled separately.
    * NEWS: Revert paragraph introduced by commit v2.16-7-g1078b64.
    * src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec.

diff --git a/NEWS b/NEWS
index 2ff7272..0130b90 100644
--- a/NEWS
+++ b/NEWS
@@ -10,15 +10,6 @@ GNU grep NEWS                                    -*- outline 
   grep (without -i) in a multibyte locale is now up to 7 times faster
   when processing many matched lines.
-  Range expressions in unibyte locales now ordinarily use the rational
-  range interpretation, in which [a-z] matches only lower-case ASCII
-  letters regardless of locale, and similarly for other ranges.  (This
-  was already true for multibyte locales.)  Portable programs should
-  continue to specify the C locale when using range expressions, since
-  these expressions have unspecified behavior in non-GNU systems and
-  are not yet guaranteed to use the rational range interpretation even
-  in GNU systems.
 ** Maintenance
   grep's --mmap option was disabled in March of 2010, and began to
diff --git a/src/dfa.c b/src/dfa.c
index f7453c7..a133e03 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1106,14 +1106,30 @@ parse_bracket_exp (void)
+              /* Defer to the system regex library about the meaning
+                 of range expressions.  */
+              regex_t re;
+              char pattern[6] = { '[', 0, '-', 0, ']', 0 };
+              char subject[2] = { 0, 0 };
               c1 = c;
               if (case_fold)
                   c1 = tolower (c1);
                   c2 = tolower (c2);
-              for (c = c1; c <= c2; c++)
-                setbit_case_fold_c (c, ccl);
+              pattern[1] = c1;
+              pattern[3] = c2;
+              regcomp (&re, pattern, REG_NOSUB);
+              for (c = 0; c < NOTCHAR; ++c)
+                {
+                  if ((case_fold && isupper (c)))
+                    continue;
+                  subject[0] = c;
+                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
+                    setbit_case_fold_c (c, ccl);
+                }
+              regfree (&re);
           colon_warning_state |= 8;


Summary of changes:
 NEWS      |    9 ---------
 src/dfa.c |   20 ++++++++++++++++++--
 2 files changed, 18 insertions(+), 11 deletions(-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]