[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gsub() is very slow in gawk 5.1.0
From: |
arnold |
Subject: |
Re: gsub() is very slow in gawk 5.1.0 |
Date: |
Thu, 15 Jul 2021 00:41:49 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi Ed.
Ed Morton <mortoneccc@comcast.net> wrote:
> I just tried the same script on my Mac using BSD awk 20200816 and it
> only took 1.4 seconds to run. Unfortunately I can't install gawk or any
> other awk on that machine to test with but I 100% believe the 2 other
> people who posted at https://stackoverflow.com/a/68371463/1745001 saying
> gawk 5.1.0 on their Macs took 23.5 secs and almost 30 secs respectively.
Once again, you have to compare apples to apples. Part of it is
definitely related to how much RAM you have. I bet that Mac of
yours has 32 Gig or more on it.
On my personal 8 Gig system, I had to kill all other awks. My work laptop
(Ubuntu 18.04) has 16 Gig. Here's the data:
$ cat t2.awk
BEGIN {
s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)
}
$ ./nawk --version
awk version 20210215
$ time ./nawk -f t2.awk
real 2m2.270s
user 0m12.061s
sys 1m50.162s
$ time ./gawk -f t2.awk
real 3m8.238s
user 3m6.167s
sys 0m1.856s
Gawk is 50% slower than nawk, but not 10 or 15 times slower.
The gawk regex routines are much more heavy-weight than what's
in nawk. And no, I can't substitute in some other regex library.
Interestingly:
$ (export LC_ALL=C ; time ./gawk -f t2.awk)
real 2m30.100s
user 2m28.561s
sys 0m1.484s
So we see that gawk is comparable to nawk when told to not
worry about multibyte locales.
I think we can put this to rest now.
Thanks,
Arnold