emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#26193: closed ([0-9] versus [[:digit:]])


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#26193: closed ([0-9] versus [[:digit:]])
Date: Thu, 23 Mar 2017 01:58:02 +0000

Your message dated Wed, 22 Mar 2017 18:57:05 -0700
with message-id <address@hidden>
and subject line Re: bug#26193: [0-9] versus [[:digit:]]
has caused the debbugs.gnu.org bug report #26193,
regarding [0-9] versus [[:digit:]]
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
26193: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=26193
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: [0-9] versus [[:digit:]] Date: Mon, 20 Mar 2017 11:34:05 -0400
In what follows, file "conjectures" is a 6 billion bytes file in which each line contains at most one letter P, and few (see output) have a digit following a P. "rusage" is just a home-brew resource usage summary command.

  rusage egrep 'P[0-9]' conjectures > xxx     
695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx 2488 mx 0 ix 0 id 0 is

  cat xxx
A[21]=11{11}:22<LP3

  rusage egrep 'P[[:digit:]]' conjectures > xxx
14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx 2500 mx 0 ix 0 id 0 is

  cat xxx
A[21]=11{11}:22<LP3

Using what is to me the more obvious [0-9] pattern takes almost 50 times as long as using the [[:digit:]] pattern. Seems very strange.

  grep --version
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

  uname -a
Linux jpl 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux






--- End Message ---
--- Begin Message --- Subject: Re: bug#26193: [0-9] versus [[:digit:]] Date: Wed, 22 Mar 2017 18:57:05 -0700
On Wed, Mar 22, 2017 at 2:58 PM, John P. Linderman <address@hidden> wrote:
> I used to use LC_ALL=C, but, as I vaguely recall, it got in the way of
> dealing with UNICODE. I tried a couple LC values aimed at UNICODE and the
> US, but something always went pear-shaped. I finally give up. I am perfectly
> happy to suffer a tiny bit of performance, to have most things work without
> thinking. A factor of 6, or 35, is not tiny, since I use grep and friends
> intensely. That's how I discovered the performance problem to begin with.
> Anyway, thank you for fixing my problem. I suspect that many of us pioneers
> (using UNIX since 1973) have '[0-9]' wired into our fingers.
>
> On Wed, Mar 22, 2017 at 2:01 PM, Paul Eggert <address@hidden> wrote:
>>
>> On 03/22/2017 05:44 AM, John P. Linderman wrote:
>>>
>>> That puts the runtimes on equal footing:
>>>
>> In my measurements, P[0-9] is still a tiny bit slower if one is using
>> glibc regex, due to a performance problem in glibc. You can work around it
>> by configuring --with-included-regex. It's probably not worth worrying
>> about, though.
>>
>> By the way, using LC_ALL=C should help avoid performance problems like
>> these in the future, if all you're doing is something where single-byte
>> pattern matching suffices.

I've just pulled that gnulib change into grep's repository with the
attached, along with a NEWS update:

Attachment: grep-gnulib-dfa-NEWS.diff
Description: Text document


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]