[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Is the speed of array index checking same as string compa
From: |
Peng Yu |
Subject: |
Re: [bug-gawk] Is the speed of array index checking same as string comparison? |
Date: |
Mon, 26 Nov 2018 16:17:11 -0600 |
> Awk arrays are hash tables. As they fill up, access will slow down.
> That said, gawk goes to a lot of trouble to try to keep access fast,
> with a small number of collisions per bucket on average. I suspect you'd
> have to REALLY load an array before you'd notice a big difference
> (millions of elements on modern machines).
Here is my test case. It seems that only when the hash size reaches a
billion, it can cause more than 80% slow-down. Before that, the
slowdown is only 10%~20% which is quite acceptable in practice. Isn't
it?
ns=(
10000
100000
1000000
10000000
100000000
1000000000
)
m=100000000
for n in "address@hidden"
do
time1=$(( time awk -v n="$n" 'BEGIN { for(i=1;i<=n;++i) d[i]=""}' ) 2>&1)
time2=$(( time awk -v n="$n" -v m="$m" 'BEGIN { for(i=1;i<=n;++i)
d[i]=""; d["NA"]=1; for(i=1;i<=m;++i) ("NA" in d) }') 2>&1)
time3=$(( time awk -v n="$n" -v m="$m" 'BEGIN { for(i=1;i<=n;++i)
d[i]=""; x="NA"; for(i=1;i<=m;++i) x=="NA" }' ) 2>&1)
printf '%s\t%s\t%s\t%s\t%s\n' "$n" "$time1" "$time2" "$time3"
"$(bc -l <<< "($time2-$time1)/($time3-$time1)")"
done
10000 0.016 7.430 6.313 1.17738605685246942988
100000 0.045 7.404 6.354 1.16642891107941036614
1000000 0.220 7.581 6.471 1.17757158854583266677
10000000 1.938 9.298 8.170 1.18100128369704749679
100000000 19.468 26.804 25.726 1.17225950782997762863
1000000000 200.543 205.224 203.093 1.83568627450980392156
--
Regards,
Peng