bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0


From: arnold
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Thu, 15 Jul 2021 00:41:49 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi Ed.

Ed Morton <mortoneccc@comcast.net> wrote:

> I just tried the same script on my Mac using BSD awk 20200816 and it 
> only took 1.4 seconds to run. Unfortunately I can't install gawk or any 
> other awk on that machine to test with but I 100% believe the 2 other 
> people who posted at https://stackoverflow.com/a/68371463/1745001 saying 
> gawk 5.1.0 on their Macs took 23.5 secs and almost 30 secs respectively.

Once again, you have to compare apples to apples. Part of it is
definitely related to how much RAM you have. I bet that Mac of
yours has 32 Gig or more on it.

On my personal 8 Gig system, I had to kill all other awks.  My work laptop
(Ubuntu 18.04) has 16 Gig. Here's the data:

$ cat t2.awk
BEGIN {
        s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)
}

$ ./nawk --version
awk version 20210215

$ time ./nawk -f t2.awk 

real    2m2.270s
user    0m12.061s
sys     1m50.162s

$ time ./gawk -f t2.awk

real    3m8.238s
user    3m6.167s
sys     0m1.856s

Gawk is 50% slower than nawk, but not 10 or 15 times slower.
The gawk regex routines are much more heavy-weight than what's
in nawk.  And no, I can't substitute in some other regex library.

Interestingly:

$ (export LC_ALL=C ; time ./gawk -f t2.awk)

real    2m30.100s
user    2m28.561s
sys     0m1.484s

So we see that gawk is comparable to nawk when told to not
worry about multibyte locales.

I think we can put this to rest now.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]