bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v2 0/9] UTF-8 speedups


From: Paolo Bonzini
Subject: [PATCH v2 0/9] UTF-8 speedups
Date: Sun, 14 Mar 2010 16:35:05 +0100

Here is v2 of the patch.  It doesn't anymore remove more code than
it adds :-) but it should work also for gawk.

Since the support for case-insensitive multibyte matching involves
some performance penalty (mostly because dfamust rarely finds a good
string) I made it conditional on the GREP symbol.  In the future
a scheme for more feature bits can be added, but for now it's good.

Compared to v1, I added Debian's character-set range patch (patch 2)
and fixed the warnings that Jim pointed out.

Paolo Bonzini (9):
  tests: add more UTF-8 test cases
  dfa: fix handling of ranges in multibyte character sets
  dfa: rewrite handling of multibyte case_fold lexing
  dfa: speed up handling of brackets
  dfa: optimize simple character sets under UTF-8 charsets
  dfa: cache MB_CUR_MAX for dfaexec
  dfa: run simple UTF-8 regexps as a single-byte character set
  grep: remove check_multibyte_string, fix non-UTF8 missed match
  grep: match multibyte charsets line-by-line when using -i

 .x-sc_cast_of_argument_to_free |    1 -
 .x-sc_space_tab                |    1 -
 NEWS                           |   15 +-
 src/dfa.c                      |  957 +++++++++++++++++++++-------------------
 src/dfa.h                      |    6 +
 src/grep.c                     |  108 ++---
 src/search.c                   |  244 ++++++-----
 tests/Makefile.am              |    7 +-
 tests/case-fold-backslash-w    |   14 +
 tests/case-fold-char-range     |   21 +
 tests/euc-mb                   |   23 +
 tests/foad1.sh                 |   10 +-
 tests/spencer1-locale          |   24 +
 tests/spencer1-locale.awk      |   30 ++
 14 files changed, 827 insertions(+), 634 deletions(-)
 delete mode 100644 .x-sc_cast_of_argument_to_free
 create mode 100755 tests/case-fold-backslash-w
 create mode 100644 tests/case-fold-char-range
 create mode 100644 tests/euc-mb
 create mode 100755 tests/spencer1-locale
 create mode 100644 tests/spencer1-locale.awk





reply via email to

[Prev in Thread] Current Thread [Next in Thread]