[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gsub() is very slow in gawk 5.1.0
From: |
Ed Morton |
Subject: |
gsub() is very slow in gawk 5.1.0 |
Date: |
Wed, 14 Jul 2021 08:20:57 -0500 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
On an online forum someone asked how to generate a string of 100,000,000
"x"s. They had tried this in a BEGIN section:
for(i=1;i<=100000000;i++) s = s "x"
and wanted to know if there was a better approach. Someone suggested:
s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}
which is also what I'd have also suggested, but upon testing that they
found that the sprintf+gsub approach was slower than the loop in gawk
5.1.0 and while I couldn't reproduce that exactly on cygwin, I can
confirm that the sprintf+gsub solution is much slower than I expected:
$ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'
real 1m19.439s
user 0m28.562s
sys 0m50.811s
$ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'
real 0m36.604s
user 0m36.093s
sys 0m0.390s
If I remove the gsub() then it runs in half a second:
$ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'
real 0m0.423s
user 0m0.171s
sys 0m0.202s
so the gsub() itself is taking over 36 seconds to run. Someone else ran
the script on a Mac with BSD awk 20070501 and got:
$ time awk 'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
"x", s)}'
real 0m1.744s
user 0m1.645s
sys 0m0.098s
i.e. it ran in under 2 seconds and yet another person said the gawk
solution took 23.5 seconds on their Mac.
So, something is causing gsub() in gawk 5.1.0 is running very slowly for
this case.
Ed.
- gsub() is very slow in gawk 5.1.0,
Ed Morton <=