[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep branch, master, updated. v2.16-18-g5133396
From: |
Jim Meyering |
Subject: |
grep branch, master, updated. v2.16-18-g5133396 |
Date: |
Tue, 18 Feb 2014 01:36:39 +0000 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".
The branch, master has been updated
via 5133396666c4e9bbe30cc1510e3ec703452d6a8b (commit)
from 825ab8547c54917fef77dc875ab070de2b8cb053 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=5133396666c4e9bbe30cc1510e3ec703452d6a8b
commit 5133396666c4e9bbe30cc1510e3ec703452d6a8b
Author: Paolo Bonzini <address@hidden>
Date: Mon Feb 17 15:18:10 2014 +0100
revert "grep: DFA now uses rational ranges in unibyte locales"
The correct course of action for grep is to defer range interpretation
to regex, because otherwise you can get mismatches between regexes with
backreferences and those without.
For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing
result that the first regex won't match a superset of the language
described by the second regex.
The source of the confusion is that, even though grep's dfa.c was changed
to use range checking instead of strcoll, that code is only invoked if
dfaexec is called with backref = NULL, and that never happens for grep!
In the end, all that's needed for RRI is compiling --with-included-regex,
and in that case the patch is almost a no-op. Almost, because there
are corner cases that aren't handled correctly (e.g. [a-[.e.]], or
regular expressions that include a NUL character), but this can be
handled separately.
* NEWS: Revert paragraph introduced by commit v2.16-7-g1078b64.
* src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec.
diff --git a/NEWS b/NEWS
index 2ff7272..0130b90 100644
--- a/NEWS
+++ b/NEWS
@@ -10,15 +10,6 @@ GNU grep NEWS -*- outline
-*-
grep (without -i) in a multibyte locale is now up to 7 times faster
when processing many matched lines.
- Range expressions in unibyte locales now ordinarily use the rational
- range interpretation, in which [a-z] matches only lower-case ASCII
- letters regardless of locale, and similarly for other ranges. (This
- was already true for multibyte locales.) Portable programs should
- continue to specify the C locale when using range expressions, since
- these expressions have unspecified behavior in non-GNU systems and
- are not yet guaranteed to use the rational range interpretation even
- in GNU systems.
-
** Maintenance
grep's --mmap option was disabled in March of 2010, and began to
diff --git a/src/dfa.c b/src/dfa.c
index f7453c7..a133e03 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1106,14 +1106,30 @@ parse_bracket_exp (void)
}
else
{
+ /* Defer to the system regex library about the meaning
+ of range expressions. */
+ regex_t re;
+ char pattern[6] = { '[', 0, '-', 0, ']', 0 };
+ char subject[2] = { 0, 0 };
c1 = c;
if (case_fold)
{
c1 = tolower (c1);
c2 = tolower (c2);
}
- for (c = c1; c <= c2; c++)
- setbit_case_fold_c (c, ccl);
+
+ pattern[1] = c1;
+ pattern[3] = c2;
+ regcomp (&re, pattern, REG_NOSUB);
+ for (c = 0; c < NOTCHAR; ++c)
+ {
+ if ((case_fold && isupper (c)))
+ continue;
+ subject[0] = c;
+ if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
+ setbit_case_fold_c (c, ccl);
+ }
+ regfree (&re);
}
colon_warning_state |= 8;
-----------------------------------------------------------------------
Summary of changes:
NEWS | 9 ---------
src/dfa.c | 20 ++++++++++++++++++--
2 files changed, 18 insertions(+), 11 deletions(-)
hooks/post-receive
--
grep
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- grep branch, master, updated. v2.16-18-g5133396,
Jim Meyering <=