[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with
From: |
Jim Meyering |
Subject: |
bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep |
Date: |
Thu, 27 Feb 2014 18:54:14 -0800 |
On Thu, Feb 27, 2014 at 8:54 AM, Glenn Morris <address@hidden> wrote:
>
> This message was sent to the address@hidden address with no
> "Package:" specified in the body. So it ended up on the help-debbugs
> mailing list rather than bug-grep. I have assigned it to grep and am
> sending this mail, which will appear on the bug-grep list.
>
> For new reports, either use the bug-grep address, or remember to use
> Package: grep at the start of the body. They both have identical results.
>
> Norihiro Tanaka wrote:
>
>> Now grep and awk matchers doesn't waste buffer in case-sensisitive matching.
>> So I think that we can avoid line-by-line matching for them.
>>
>> It enable to speed up case-sensitive matching with grep or awk matcher
>> without trivial_case_ignore as fast as when with it.
>>
>> In bug#16232:
>>> The following times 2.16, 2.17 and 2.17+patch two ways:
>>>
>>> $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
>>> $ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time
>>> /p/p/grep-2.$i/bin/grep -i foobar k; done
>>> 16
>>> 15.96 real 14.57 user 0.12 sys
>>> 17
>>> 1.13 real 1.07 user 0.06 sys
>>> 18
>>> 1.96 real 1.89 user 0.06 sys
>>>
>>> The above search takes more than 70% longer with the proposed patch.
>>
>> Therefore, I think 30% slow-down is caused by the line-by-line matching
>> for them.
>
> [See attachment at http://debbugs.gnu.org/16893]
Thank you for forwarding that, Glenn.
Thank you for the patch, Norihiro.
However, your removal of the "MB_CUR_MAX == 1" disjunct
would cause unibyte grep -i with (-F or -P) to match line-by-line,
whereas currently it uses the buffer-matching code.
That would make a search like this take 3 times longer:
seq 30000000 > in
LC_ALL=C grep -iF foo in
Without your patch, best of 5 wall clock time is 0.31s,
yet with the patch, it takes 1.00s.
I presume you will want to retain that disjunct?
If you submit an adjusted patch, please also include in the
commit log some timing examples showing how the change affects
grep's performance.
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, (continued)
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/19
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Paul Eggert, 2014/02/19
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/19
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, arnold, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
- Message not available
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Glenn Morris, 2014/02/27
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep,
Jim Meyering <=
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Norihiro Tanaka, 2014/02/28
- bug#16893: [PATCH] Avoid matching line-by-line for case-insensitive with grep, Norihiro Tanaka, 2014/02/28
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Paul Eggert, 2014/02/20
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/20
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/02/19