bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22655: grep -Pz '^' now fails!


From: Paul Eggert
Subject: bug#22655: grep -Pz '^' now fails!
Date: Sat, 19 Nov 2016 23:57:22 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

Stephane Chazelas wrote:
I don't find a x220 factor, more like a x2.5 factor:

I think I found the factor-of-hundreds slowdown, and fixed it in the 2nd attached patch.

When I tried your benchmark with pcregrep (pcre 8.39, configured with --enable-unicode-properties), and with ./grep0 (which has the PCRE_MULTILINE implementation, i.e., commit da94c91a81fc63275371d0580d8688b6abd85346), and with ./grep (which is grep after the attached patches are installed), I got timings like the following:

    user  sys
    1.972 0.072 LC_ALL=en_US.utf8 pcregrep -u "z.*a" k
    0.234 0.076 LC_ALL=en_US.utf8 ./grep0 -P "z.*a" k
    1.280 0.064 LC_ALL=en_US.utf8 ./grep -P "z.*a" k
    1.487 0.077 LC_ALL=C pcregrep "z.*a" k
    0.193 0.067 LC_ALL=C ./grep0 -P "z.*a" k
    0.825 0.096 LC_ALL=C ./grep -P "z.*a" k

All times are CPU seconds. This is Fedora 24 x86-64, AMD Phenom II X4 910e. As before, k was created by the shell command: yes 'abcdefg hijklmn opqrstu vwxyz' | head -n 10000000 >k

So, on this benchmark using PCRE_MULTILINE gave a speedup of a factor of ~4.3 in a multibyte locale, and a speedup of ~3.5 in a unibyte locale.

On the other hand if you change the pattern to "z[^+]*a",
pcregrep still takes about one second, but GNU grep a lot longer

Yes, that example makes GNU grep -P look really bad. So installed the 1st attached patch, which mostly just reverts the January multiline patch, i.e., it goes back to the slower "./grep -P" lines measured above.

Attachment: 0001-grep-P-no-longer-uses-PCRE_MULTILINE.patch
Description: Text Data

Attachment: 0002-grep-further-P-performance-fix.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]