bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales


From: Norihiro Tanaka
Subject: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Date: Sat, 20 Dec 2014 11:45:15 +0900

On Sat, 20 Dec 2014 02:23:27 +0100
Vincent Lefevre <address@hidden> wrote:

> Debian grep 2.20-3      6.64s (with -P)
> Upstream grep 2.21      5.39s (with -P)
> Debian pcregrep 8.35    0.71s

Did you use pcregrep --utf-8?  You should use pcregrep --utf-8 pcregrep
to compare.  By the way, pcregrep --utf-8 does not support binary files.
If pcregrep found 20 errors, it will exit without reading an input text
until the last.

$ yes src/grep | head -1000 | xargs cat > big_grep
$ ls -l big_grep
-rw-r--r--. 1 staff users 611453000 Dec 20 11:30 big_grep
$ time -p env LC_ALL=en_US.utf8 src/grep -P test big_grep
real 10.16
user 10.09
sys 0.07
$ time -p pcregrep --buffer-size=65536 test big_grep
real 1.50
user 1.41
sys 0.09
$ time -p pcregrep --buffer-size=65536 --utf-8 test big_grep 2>&1 | tail -1
pcregrep: Too many errors - abandoned.
real 0.00
user 0.00
sys 0.00
$ pcregrep --version
pcregrep version 8.36 2014-09-26






reply via email to

[Prev in Thread] Current Thread [Next in Thread]