bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34053: [PATCH] grep: fix slow for multiple word matching


From: Norihiro Tanaka
Subject: bug#34053: [PATCH] grep: fix slow for multiple word matching
Date: Sun, 13 Jan 2019 08:45:47 +0900

Hi,

grep uses KWset matcher for multiple word matching.  It is very slow when
most of the parts matched to a pattern are not words.  So, if a part firstly
matched to pattern is not a word, use the grep matcher to match for its line.

By the way, if START_PTR is set, grep matcher uses regex matcher which is
very slow to match words.  Therefore, we use grep matcher when only START_PTR
is not set.

Example, although it is a very extreme case...

$ cat >pat <<EOF
0
00 0
00 00 0
00 00 00 0
00 00 00 00 0
00 00 00 00 00 0
00 00 00 00 00 00 0
00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 00 00 00 00 00 00 00 00 0
EOF
$ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | head -1000000 >inp

$ env LC_ALL=C time -p src/grep -wf pat inp
real 5.75
user 5.67
sys 0.02

Retry after applied the patch.

$ env LC_ALL=C time -p src/grep -wf pat inp
real 0.32
user 0.31
sys 0.00

Thanks,
Norihiro

Attachment: 0001-grep-fix-slow-multiple-word-matching.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]