|
From: | GNU bug Tracking System |
Subject: | [debbugs-tracker] bug#26193: closed ([0-9] versus [[:digit:]]) |
Date: | Thu, 23 Mar 2017 01:58:02 +0000 |
Your message dated Wed, 22 Mar 2017 18:57:05 -0700 with message-id <address@hidden> and subject line Re: bug#26193: [0-9] versus [[:digit:]] has caused the debbugs.gnu.org bug report #26193, regarding [0-9] versus [[:digit:]] to be marked as done. (If you believe you have received this mail in error, please contact address@hidden) -- 26193: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=26193 GNU Bug Tracking System Contact address@hidden with problems
--- Begin Message ---Subject: [0-9] versus [[:digit:]] Date: Mon, 20 Mar 2017 11:34:05 -0400 In what follows, file "conjectures" is a 6 billion bytes file in which each line contains at most one letter P, and few (see output) have a digit following a P. "rusage" is just a home-brew resource usage summary command.rusage egrep 'P[0-9]' conjectures > xxx695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx 2488 mx 0 ix 0 id 0 iscat xxxA[21]=11{11}:22<LP3rusage egrep 'P[[:digit:]]' conjectures > xxx14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx 2500 mx 0 ix 0 id 0 iscat xxxA[21]=11{11}:22<LP3Using what is to me the more obvious [0-9] pattern takes almost 50 times as long as using the [[:digit:]] pattern. Seems very strange.grep --versiongrep (GNU grep) 2.25Copyright (C) 2016 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html >.This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS >.uname -aLinux jpl 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
--- End Message ---
--- Begin Message ---Subject: Re: bug#26193: [0-9] versus [[:digit:]] Date: Wed, 22 Mar 2017 18:57:05 -0700 On Wed, Mar 22, 2017 at 2:58 PM, John P. Linderman <address@hidden> wrote: > I used to use LC_ALL=C, but, as I vaguely recall, it got in the way of > dealing with UNICODE. I tried a couple LC values aimed at UNICODE and the > US, but something always went pear-shaped. I finally give up. I am perfectly > happy to suffer a tiny bit of performance, to have most things work without > thinking. A factor of 6, or 35, is not tiny, since I use grep and friends > intensely. That's how I discovered the performance problem to begin with. > Anyway, thank you for fixing my problem. I suspect that many of us pioneers > (using UNIX since 1973) have '[0-9]' wired into our fingers. > > On Wed, Mar 22, 2017 at 2:01 PM, Paul Eggert <address@hidden> wrote: >> >> On 03/22/2017 05:44 AM, John P. Linderman wrote: >>> >>> That puts the runtimes on equal footing: >>> >> In my measurements, P[0-9] is still a tiny bit slower if one is using >> glibc regex, due to a performance problem in glibc. You can work around it >> by configuring --with-included-regex. It's probably not worth worrying >> about, though. >> >> By the way, using LC_ALL=C should help avoid performance problems like >> these in the future, if all you're doing is something where single-byte >> pattern matching suffices. I've just pulled that gnulib change into grep's repository with the attached, along with a NEWS update:grep-gnulib-dfa-NEWS.diff
Description: Text document
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |