bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] fall back to glibc matcher if a MBCSET is found


From: Paolo Bonzini
Subject: [PATCH] fall back to glibc matcher if a MBCSET is found
Date: Wed, 8 Sep 2010 10:47:05 +0200

This patch works around some of the performance problems of multibyte grep.
The patch has been in RHEL-6 for a few months.  I think it is also a
correctness patch, since grep has no way to support multi-character
collation elements.

For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z].
For other character sets all brackets and `.` as well will trigger it.

* src/dfa.c (dfaexec): Fall back to glibc for multibyte matches,
if possible.
---
 src/dfa.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 91124b6..3708be7 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3237,6 +3237,15 @@ dfaexec (struct dfa *d, char const *begin, char *end,
                 continue;
               }
 
+            if (backref)
+              {
+                *backref = 1;
+                free(mblen_buf);
+                free(inputwcs);
+                *end = saved_end;
+                return (char *) p;
+              }
+
             /* Can match with a multibyte character (and multi character
                collating element).  Transition table might be updated.  */
             s = transit_state(d, s, &p);
-- 
1.7.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]