[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep -f scales extremely poorly with number of lines in pattern file

From: Narfi Stefansson
Subject: grep -f scales extremely poorly with number of lines in pattern file
Date: Sun, 19 Mar 2006 23:28:48 -0500
User-agent: KMail/1.8.2

Hi all,

I was expecting it to take order n time to match against n patterns, where n 
is the number of lines in the pattern file.

What I am experiencing:  The execution time of grep -f behaves like n^a, with 
a somewhere in the range [2.5, 3.0]. 

The execution time with a 100 line pattern file was 0.17 sec, but the 
execution time with a 1000 line pattern file was 59.7 seconds, nowhere near 
the 10 * 0.17 sec = 1.7 seconds that I expected.

Details  and reproduction steps are given below.



Here is what I did, I created 1000 strings and 1000 regexps:

cat /dev/urandom  | od -x | head -1000 > regexps.txt 
cat /dev/urandom  | od -x | head -1000 > text.txt

Avoid any problems with the locale:
export LC_ALL=C
export LANG=C

I then timed the execution of matching against the first 100, 200, etc 
patterns in regexps.txt:
for n in 100 200 300 400 500 600 700 800 900 1000; do 
    head -$n regexps.txt > current_regexps.txt
    /usr/bin/time grep -f current_regexps.txt text.txt

Following is the user time in seconds as reported by /usr/bin/time, for n in 
100, 200, ..., 1000:

0.17 0.80 2.08 4.10 7.02 11.33 17.31 28.68 41.39 59.70

This is on Mandriva 2006.
% grep --version
grep (GNU grep) 2.5.1

reply via email to

[Prev in Thread] Current Thread [Next in Thread]