[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x)
From: |
Jim Meyering |
Subject: |
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales |
Date: |
Wed, 19 Feb 2014 19:44:59 -0800 |
Hmm... it's not as clear-cut as I first thought.
(I built 2.17+ the above patch and put it in a directory named grep-2.18)
The following times 2.16, 2.17 and 2.17+patch two ways:
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
$ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time
/p/p/grep-2.$i/bin/grep -i foobar k; done
16
15.96 real 14.57 user 0.12 sys
17
1.13 real 1.07 user 0.06 sys
18
1.96 real 1.89 user 0.06 sys
The above search takes more than 70% longer with the proposed patch.
Contrast that with performance in the non-UTF8 ja_JP.eucJP locale:
$ yes $(printf '%078dm' 0)|head -10000 > in
$ for i in 16 17 18; do echo $i; env LC_ALL=ja_JP.eucJP time
/p/p/grep-2.$i/bin/grep -i n in; done
16
0.03 real 0.02 user 0.00 sys
17
2.98 real 2.96 user 0.00 sys
18
0.02 real 0.02 user 0.00 sys
Using the jjj+foobar example, but with only 100k lines, we see there
was a 200x performance regression going from grep-2.16 to 2.17:
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -100000 > k
$ for i in 16 17 18; do echo $i; env LC_ALL=ja_JP.eucJP time
/p/p/grep-2.$i/bin/grep -i foobar k; done
16
0.15 real 0.14 user 0.00 sys
17
27.74 real 27.72 user 0.01 sys
18
0.11 real 0.11 user 0.00 sys
Obviously, I want to retain all of 2.17's performance gain in UTF-8 locales,
while avoiding the 200x penalty in multi-byte non-UTF8 locales like ja_JP.eucJP.
So I have prepared a better patch.
With the two attached commits (on top of 2.17), I get these timings,
i.e., the same 200x improvement with ja_JP.eucJP, and no regression
with en_US.UTF8)
$ for i in 16 17 18; do printf "$i: "; env LC_ALL=ja_JP.eucJP time
/p/p/grep-2.$i/bin/grep -i foobar k; done
16: 0.14 real 0.14 user 0.00 sys
17: 27.97 real 27.95 user 0.01 sys
18: 0.12 real 0.12 user 0.00 sys
$ for i in 16 17 18; do printf "$i: "; env LC_ALL=en_US.UTF-8 time
/p/p/grep-2.$i/bin/grep -i foobar k; done
16: 0.13 real 0.12 user 0.00 sys
17: 0.01 real 0.01 user 0.00 sys
18: 0.01 real 0.01 user 0.00 sys
k.txt
Description: Text document
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, (continued)
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, arnold, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- Message not available
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Glenn Morris, 2014/02/27
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Jim Meyering, 2014/02/27
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Norihiro Tanaka, 2014/02/28
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Norihiro Tanaka, 2014/02/28
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Paul Eggert, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales,
Jim Meyering <=