emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44754: closed (Extreme performance degradation in GNU grep 3.4 / 3.6


From: GNU bug Tracking System
Subject: bug#44754: closed (Extreme performance degradation in GNU grep 3.4 / 3.6)
Date: Fri, 27 Nov 2020 07:42:03 +0000

Your message dated Thu, 26 Nov 2020 21:41:20 -1000
with message-id 
<CA+8g5KET6Y_EiJmy_wbbfgrSnaak1CUVB5sLj2DKHeMU2JBRpQ@mail.gmail.com>
and subject line Re: bug#44754: Extreme performance degradation in GNU grep 3.4 
/ 3.6
has caused the debbugs.gnu.org bug report #44754,
regarding Extreme performance degradation in GNU grep 3.4 / 3.6
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
44754: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44754
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: Extreme performance degradation in GNU grep 3.4 / 3.6 Date: Fri, 20 Nov 2020 06:20:55 +0100
Hi,

I have a use case where I run grep with a large number of search
patterns on a large text file. It works well with grep-3.3, but with
grep-3.4 it quickly burned through GBs of memory and almost locked
up my system due to swapping.

To avoid attaching those large files, I could mostly reproduce the
effects like this:

ulimit -d 5000000  # avoid system lockup due to excessive swapping
export LC_ALL=C    # make sure no Unicode case conversions are needed

% time  ./grep-3.3 -Fwif <(seq 300000 | tr 0-9 A-J) <<<foo

real    0m0.054s
user    0m0.048s
sys     0m0.012s

% time  ./grep-3.4 -Fwif <(seq 30000 | tr 0-9 A-J) <<<foo
./grep-3.4: Memory exhausted
Aborted

real    0m1.291s
user    0m0.696s
sys     0m0.599s

% time  ./grep-3.6 -Fwif <(seq 300000 | tr 0-9 A-J) <<<foo

real    0m13.162s
user    0m12.955s
sys     0m0.211s

grep-3.3 behaves well, even with much larger number of patterns.
Time seems to grow linearly, and memory usage is constant.

grep-3.4 behaves the worst of these 3 versions. Even with just 30000
patterns it exceeds the ulimit of 5 GB.

grep-3.6 behaves a bit better than 3.4, but still bad. Time seems to
be quadratic in the number of patterns, and though memory usage in
this case seems to be almost constant, in my actual use case it also
runs out of memory where grep-3.3 works well with just a few 100 MB
used.

Without "-i", grep-3.4 seems to run as fast as grep-3.3, but
grep-3.6 is almost as slow as with "-i".

So there might actually be two different issues here, one that
affects 3.4 with "-i" and one that affects 3.6 with or without "-i".

Regards,
Frank



--- End Message ---
--- Begin Message --- Subject: Re: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Date: Thu, 26 Nov 2020 21:41:20 -1000
On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering <jim@meyering.net> wrote:
>
> On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <jim@meyering.net> wrote:
> > Thank you for the fine bug report.
> > The grep-3.6 bug you've exposed is due to the fact that your input
> > triggers excessive hash collisions when using the code modeled after
> > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> > take O(N^2) time for N patterns. In the attached, I've switched grep
> > to use the djb2 hash function, and that resolves the problem. I'll
> > also add a NEWS entry and a test before pushing this.
>
> Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
> Here's an example that would take 2-3 days with grep-3.6 and only
> seconds with this fix:
>
>   : | grep -Ff <(seq 6400000 | tr 0-9 A-J)
>
> Here's a complete patch.
> I'll push it later today.

Pushed along with two gnulib-related changes.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]