|
From: | Paul Eggert |
Subject: | bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. |
Date: | Sun, 13 Sep 2020 19:03:33 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 9/11/20 11:41 PM, Jim Meyering wrote:
https://bugs.gnu.org/40634#32 I'll try to take a look at the later patch.Oh! Glad you spotted that.
I took a look and the basic idea sounds good though I admit I did not check every detail. While looking into it I found some opportunities for improvements, plus I found what appear to be some longstanding bugs in the area, one of which causes a grep test failure on Solaris (and I suspect the bug is also on GNU/Linux but the grep tests don't catch it). I installed the attached patches into Gnulib, updated grep to point to the new Gnulib version, and added a note in grep's NEWS file about this.
Patch 1 is what Norihiro Tanaka proposed in Bug#40634#32, except I edited the commit message. Patch 2 consists of minor cleanups and performance tweaks for Patch 1. (Patches 3 and 4 are omitted as they were installed by others into Gnulib at about the same time I was installing these.) Patch 5 fixes a dfa-heap-overrun failure on Solaris that appears to be a longstanding bug exposed by Patch 1 when running on Solaris. Patch 6 merely cleans up code near Patch 5. Patch 7 fixes the use of an uninitialized constraint, which I discovered while debugging Patch 5 under Valgrind; this also appears to be a longstandiung bug.
Coming up with test cases for all these bugs would be pretty tricky, unfortunately.
0001-dfa-use-backward-set-in-removal-of-epsilon-closure.patch
Description: Text Data
0002-dfa-epsilon-closure-tweaks-Bug-40634.patch
Description: Text Data
0005-dfa-fix-dfa-heap-overrun-failure.patch
Description: Text Data
0006-dfa-assume-C99-in-reorder_tokens.patch
Description: Text Data
0007-dfa-avoid-use-of-uninitialized-constraint.patch
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |