[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Is the speed of array index checking same as string compa
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] Is the speed of array index checking same as string comparison? |
Date: |
Tue, 27 Nov 2018 08:46:22 -0500 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, Nov 26, 2018 at 05:03:50PM -0600, Peng Yu wrote:
> The above test case has a bug --- the baseline was not correct.
>
> Here is the correct one. It seems that there is only a constant factor
> (~40%) slow-down no matter how large the hash size is. Is it?
>
> ns=(
> 10
> 100
> 1000
> 10000
> 100000
> 1000000
> 10000000
> 100000000
> 1000000000
> )
> m=100000000
> for n in "address@hidden"
> do
> time1=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; for(i=1;i<=m;++i) {} }' ) 2>&1)
> time2=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; d["NA"]=1; for(i=1;i<=m;++i) ("NA" in d) }'
> ) 2>&1)
> time3=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; x="NA"; for(i=1;i<=m;++i) x=="NA" }' )
> 2>&1)
> printf '%s\t%s\t%s\t%s\t%s\n' "$n" "$time1" "$time2" "$time3"
> "$(bc -l <<< "($time2-$time1)/($time3-$time1)")"
> done
> 10 3.695 7.461 6.350 1.41845574387947269303
> 100 3.630 7.420 6.297 1.42107236595425571803
> 1000 3.694 7.431 6.345 1.40965673330818559034
> 10000 3.686 7.424 6.344 1.40632054176072234762
> 100000 3.693 7.440 6.347 1.41183119819140919366
> 1000000 3.776 7.576 6.496 1.39705882352941176470
> 10000000 5.692 9.302 8.230 1.42237982663514578408
> 100000000 23.647 26.845 25.913 1.41129744042365401588
> 1000000000 201.134 204.302 203.471 1.35558408215661103979
There are 3 different array back-end implementations used in gawk to optimize
performance: cint (consecutive positive integer indices), int (integer
indices), and str (string indices). So your measurements are not generally
valid. In this case, I think a cint array is created, but then there's a
lookaside array for strings that are later inserted. So I fear that you are
not measuring what you think you are measuring.
Regards,
Andy