|
From: | GNU bug Tracking System |
Subject: | [debbugs-tracker] bug#23932: closed (dfa: use algorithm for single byte character to any single byte character in input text always) |
Date: | Thu, 01 Sep 2016 18:51:01 +0000 |
Your message dated Thu, 1 Sep 2016 11:49:59 -0700 with message-id <address@hidden> and subject line Re: bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always has caused the debbugs.gnu.org bug report #23932, regarding dfa: use algorithm for single byte character to any single byte character in input text always to be marked as done. (If you believe you have received this mail in error, please contact address@hidden) -- 23932: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=23932 GNU Bug Tracking System Contact address@hidden with problems
--- Begin Message ---Subject: dfa: use algorithm for single byte character to any single byte character in input text always Date: Sun, 10 Jul 2016 18:51:43 +0900 In multibyte locales, if a pattern start with period expression, matching is still slow, as transition table is built at run time, even when next character is single byte in input text. This patch changes it into as use algorithm for single byte character to any single byte character in input text always. If transition table has been built already and a next character in input text is single byte, transit to next state by reference of only pre-built transition table, even if from a state including ANYCHAR. $ yes "$(printf 'a%038db\n' 0)" | head -1000000 >in $ env LC_ALL=C gcc -v Reading specs from /usr/local/lib/gcc/x86_64-pc-linux-gnu/4.4.7/specs Target: x86_64-pc-linux-gnu Configured with: ./configure --with-as=/usr/local/bin/as --with-ld=/usr/local/bin/ld --with-system-zlib --enable-__cxa_atexit Thread model: posix gcc version 4.4.7 (GCC) patch#21486 is required before this patch. grep will speed up by this patch additionaly. [grep-2.25] $ time -p env LC_ALL=ja_JP.eucjp grep .a.b in real 4.78 user 4.42 sys 0.16 $ time -p env LC_ALL=ja_JP.eucjp grep '.\{41\}' in real 46.23 user 43.98 sys 0.21 [after patch#21486] $ time -p env LC_ALL=ja_JP.eucjp src/grep .a.b in real 1.26 user 1.08 sys 0.08 $ time -p env LC_ALL=ja_JP.eucjp src/grep '.\{41\}' in real 1.14 user 1.00 sys 0.10 [after this patch] $ time -p env LC_ALL=ja_JP.eucjp src/grep .a.b in real 0.47 user 0.36 sys 0.07 $ time -p env LC_ALL=ja_JP.eucjp src/grep '.\{41\}' in real 0.24 user 0.18 sys 0.05 [locale C (ref.)] $ time -p env LC_ALL=C src/grep .a.b in real 0.23 user 0.11 sys 0.09 $ time -p env LC_ALL=C src/grep '.\{41\}' in real 0.22 user 0.13 sys 0.060001-dfa-use-algorithm-for-single-byte-character-to-any-s.patch
Description: Text document
--- End Message ---
--- Begin Message ---Subject: Re: bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always Date: Thu, 1 Sep 2016 11:49:59 -0700 Thanks for that set of patches too. I rebased it and tweaked NEWS and installed the resulting patch set (attached) into Savannah master. User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 0001-dfa-use-single-byte-algorithm-even-in-non-UTF-8.patch
Description: Text Data0002-dfa-avoid-invalid-character-matching-period.patch
Description: Text Data0003-dfa-document-previous-change.patch
Description: Text Data0004-dfa-remove-separation-by-context-in-transition-in-no.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |