[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#32750: closed ([PATCH 2/2] dfa: optmization of alt

From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#32750: closed ([PATCH 2/2] dfa: optmization of alternation in NFA)
Date: Wed, 19 Sep 2018 04:13:01 +0000

Your message dated Tue, 18 Sep 2018 21:12:18 -0700
with message-id <address@hidden>
and subject line Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in 
has caused the debbugs.gnu.org bug report #32750,
regarding [PATCH 2/2] dfa: optmization of alternation in NFA
to be marked as done.

(If you believe you have received this mail in error, please contact

32750: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=32750
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: [PATCH 2/2] dfa: optmization of alternation in NFA Date: Mon, 17 Sep 2018 23:14:52 +0900

Even when similer states exists in alternation, DFA treats them as
separate items.  It may complicate the transition in NFA and cause
slowdown.  This change assembles the states into one.

For example, ab|ac is changed into a(b|c).

This change speeds-up matching for many branched pattern.  For
example, grep speeds-up more than 30x in following case.

  seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
  time -p env LC_ALL=C grep -vf in in

  (before) real 63.43 user 61.67 system 1.65
  (after)  real  1.64 user  1.61 system 0.03

  # If we do not add '.' at last in pattern, not dfa but kwset is used.

grep also speeds-up about 3x in following case.

  time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words 

  (before) real  2.48 user  2.09 system 0.38
  (after)  real  7.69 user  6.32 system 1.29


Attachment: 0001-dfa-simplify-initial-state.patch
Description: Text document

Attachment: 0002-dfa-optmization-of-alternation-in-NFA.patch
Description: Text document

--- End Message ---
--- Begin Message --- Subject: Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Date: Tue, 18 Sep 2018 21:12:18 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
Norihiro Tanaka wrote:
I thought of various things for the name of the function, but I could
not think of a good name.

Yes, that's a tough one. I eventually came up with maybe_disable_superset_dfa. Perhaps we can think of an ever better one.

I installed your patches into Gnulib, along with the attached relatively-minor fixups, and then propagated the result into grep by updating the Gnulib version that the grep master points to.

As you may notice, nowadays I prefer using signed types like ptrdiff_t to unsigned ones like size_t, as the signed types have a significant advantage in overflow checking with gcc -fsanitize=undefined. Perhaps at some point we should change the other instances of size_t in dfa.c, but one step at a time.

Thanks again for the performance improvement.

Attachment: 0001-dfa-reorder-enum-for-efficiency.patch
Description: Text Data

Attachment: 0002-dfa-prune-states-as-we-go.patch
Description: Text Data

Attachment: 0003-dfa-tweak-allocation-performance.patch
Description: Text Data

Attachment: 0004-dfa-use-more-informative-function-name.patch
Description: Text Data

--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]