[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38690: grep -P quadratic on long lines
From: |
Martin Raszyk |
Subject: |
bug#38690: grep -P quadratic on long lines |
Date: |
Fri, 20 Dec 2019 10:54:14 +0100 |
Dear grep maintainers,
I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.
The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime
On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.
Best regards,
Martin
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#38690: grep -P quadratic on long lines,
Martin Raszyk <=