bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #38594] Poor grep performance for long regexp compared to performan


From: Jaroslav Škarvada
Subject: [bug #38594] Poor grep performance for long regexp compared to performance with -P option
Date: Tue, 26 Mar 2013 07:46:57 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0

URL:
  <http://savannah.gnu.org/bugs/?38594>

                 Summary: Poor grep performance for long regexp compared to
performance with -P option
                 Project: grep
            Submitted by: yarda
            Submitted on: Tue 26 Mar 2013 07:46:55 AM GMT
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

This was originally reported in:
http://bugzilla.redhat.com/show_bug.cgi?id=875131

There's huge gap between performance of grep and grep -P for certain regular
expressions.

Steps to Reproduce:
1. 

PATTERN="^.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
00000000000"

INPUTLINE="..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
00000000000"

for i in `seq -w 1 10000`; do echo ${i}${INPUTLINE}${i} >> /tmp/input ;done

2. time grep -P -v "$PATTERN" /tmp/input
3. time grep -v "$PATTERN" /tmp/input
4. time grep  -v "^.\{1143\} 0\{11\}" /tmp/input 
5. time grep -P -v "^.{1143} 0{11}" /tmp/input
4. export LANG=C
5. repeat 2., 3.
6. export LANG=en_US.iso88591
7. repeat 2., 3.

Actual results:

grep -P is 300-7000x faster than without -P option (for all combinations of
LANG and usage of $PATTERN or "^.\{1143\} 0\{11\}", resp. "^.{1143} 0{11}"
with -P).

Expected results:

performance of grep is comparable when using the same pattern.





    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?38594>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]