emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#16777: closed ([PATCH] Revert "grep: DFA now uses


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#16777: closed ([PATCH] Revert "grep: DFA now uses rational ranges in unibyte locales")
Date: Sat, 01 Mar 2014 07:00:02 +0000

Your message dated Fri, 28 Feb 2014 22:59:10 -0800
with message-id <address@hidden>
and subject line Re: [PATCH] Revert "grep: DFA now uses rational ranges in 
unibyte locales"
has caused the debbugs.gnu.org bug report #16777,
regarding [PATCH] Revert "grep: DFA now uses rational ranges in unibyte locales"
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
16777: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16777
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: [PATCH] Revert "grep: DFA now uses rational ranges in unibyte locales" Date: Mon, 17 Feb 2014 15:18:10 +0100
The correct course of action for grep is to defer range interpretation
to regex, because otherwise you can get mismatches between regexes with
backreferences and those without.

For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing
result that the first regex won't match a superset of the language
described by the second regex.

The source of the confusion is that, even though grep's dfa.c was changed
to use range checking instead of strcoll, that code is only invoked if
dfaexec is called with backref = NULL, and that never happens for grep!

In the end, all that's needed for RRI is compiling --with-included-regex,
and in that case the patch is almost a no-op.  Almost, because there
are corner cases that aren't handled correctly (e.g. [a-[.e.]], or
regular expressions that include a NUL character), but this can be
handled separately.

* NEWS: Revert paragraph introduced by commit 1078b64302.
* src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec.

Signed-off-by: Paolo Bonzini <address@hidden>
---
 NEWS      |  9 ---------
 src/dfa.c | 20 ++++++++++++++++++--
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/NEWS b/NEWS
index 2ff7272..0130b90 100644
--- a/NEWS
+++ b/NEWS
@@ -10,15 +10,6 @@ GNU grep NEWS                                    -*- outline 
-*-
   grep (without -i) in a multibyte locale is now up to 7 times faster
   when processing many matched lines.
 
-  Range expressions in unibyte locales now ordinarily use the rational
-  range interpretation, in which [a-z] matches only lower-case ASCII
-  letters regardless of locale, and similarly for other ranges.  (This
-  was already true for multibyte locales.)  Portable programs should
-  continue to specify the C locale when using range expressions, since
-  these expressions have unspecified behavior in non-GNU systems and
-  are not yet guaranteed to use the rational range interpretation even
-  in GNU systems.
-
 ** Maintenance
 
   grep's --mmap option was disabled in March of 2010, and began to
diff --git a/src/dfa.c b/src/dfa.c
index f7453c7..a133e03 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1106,14 +1106,30 @@ parse_bracket_exp (void)
             }
           else
             {
+              /* Defer to the system regex library about the meaning
+                 of range expressions.  */
+              regex_t re;
+              char pattern[6] = { '[', 0, '-', 0, ']', 0 };
+              char subject[2] = { 0, 0 };
               c1 = c;
               if (case_fold)
                 {
                   c1 = tolower (c1);
                   c2 = tolower (c2);
                 }
-              for (c = c1; c <= c2; c++)
-                setbit_case_fold_c (c, ccl);
+
+              pattern[1] = c1;
+              pattern[3] = c2;
+              regcomp (&re, pattern, REG_NOSUB);
+              for (c = 0; c < NOTCHAR; ++c)
+                {
+                  if ((case_fold && isupper (c)))
+                    continue;
+                  subject[0] = c;
+                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
+                    setbit_case_fold_c (c, ccl);
+                }
+              regfree (&re);
             }
 
           colon_warning_state |= 8;
-- 
1.8.5.3




--- End Message ---
--- Begin Message --- Subject: Re: [PATCH] Revert "grep: DFA now uses rational ranges in unibyte locales" Date: Fri, 28 Feb 2014 22:59:10 -0800 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 As this patch is installed I'm taking the liberty of marking the bug as done.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]