[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0

From: Ed Morton
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Wed, 14 Jul 2021 22:24:07 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/14/2021 8:20 AM, Ed Morton wrote:
On an online forum someone asked how to generate a string of 100,000,000 "x"s. They had tried this in a BEGIN section:

   for(i=1;i<=100000000;i++) s = s "x"

and wanted to know if there was a better approach. Someone suggested:

   s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}

which is also what I'd have also suggested, but upon testing that they found that the sprintf+gsub approach was slower than the loop in gawk 5.1.0 and while I couldn't reproduce that exactly on cygwin, I can confirm that the sprintf+gsub solution is much slower than I expected:

   $ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'

   real    1m19.439s
   user    0m28.562s
   sys     0m50.811s

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'

   real    0m36.604s
   user    0m36.093s
   sys     0m0.390s

If I remove the gsub() then it runs in half a second:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'

   real    0m0.423s
   user    0m0.171s
   sys     0m0.202s

so the gsub() itself is taking over 36 seconds to run. Someone else ran the script on a Mac with BSD awk 20070501 and got:

   $ time awk  'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
   "x", s)}'

   real    0m1.744s
   user    0m1.645s
   sys 0m0.098s

i.e. it ran in under 2 seconds and yet another person said the gawk solution took 23.5 seconds on their Mac.

So, something is causing gsub() in gawk 5.1.0 is running very slowly for this case.



   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | sed 's/
   /x/g' >/dev/null

   real    0m40.100s
   user    0m39.608s
   sys     0m0.421s

so GNU sed is apparently just as slow. `tr` is fast as you'd expect but I know that's apples to oranges:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | tr ' '
   'x' >/dev/null

   real    0m0.889s
   user    0m0.452s
   sys     0m0.577s



reply via email to

[Prev in Thread] Current Thread [Next in Thread]