Re: NaN-toolbox much faster now
Jaroslav Hajek |
Re: NaN-toolbox much faster now |
Sat, 14 Mar 2009 11:39:18 +0100 |
On Thu, Mar 12, 2009 at 5:13 PM, Alois Schlögl <address@hidden> wrote:
> The following improvements have been included in the NaN-toolbox.
>
> - - sumskipnan_mex.mex has been optimized for speed (minimizing cache
> missing, reducing loop overhead)
>
> - - a flag is set if some NaN occures in the data. The flag can be checked
> (and reset) with function FLAG_NANS_OCCURED(). This enables a flexible
> control on checks for NaN. (You can check after every call, or only at
> the end of your script).
>
> - - the performance of var, std, and meansq has been improved.
>
> A performance between the NaN-toolbox and corresponding standard octave
> functions (see script below) show the following results (time in [s]):
>
>
> with NaN-tb w/o NaN-tb ratio
> 0.25884 3.56726 13.78183 mean(x,1)/nanmean(x,1)
> 0.36784 3.32899 9.05020 mean(x,2)/nanmean(x,2)
> 0.30019 6.62467 22.06789 std(x,0,1)
> 0.40114 2.23262 5.56561 std(x,0,2)
> 0.28681 6.40276 22.32407 var(x,0,1)
> 0.40269 2.18056 5.41505 var(x,0,2)
> 0.28175 4.05612 14.39598 meansq(x,1)
> 0.40703 4.19346 10.30248 meansq(x,2)
> 0.25930 0.19884 0.76683 sumskipnan(x,1)/sum(x,1)
> 0.30624 0.24179 0.78955 sumskipnan(x,2)/sum(x,2)
>
>
> A performance improvement by factors as high as 22 can be seen, and
> sumskipnan() is only about 25% slower than sum().
>
> Of course, sumskipnan could also improve the speed of functions like
> nanmean, nanstd, etc. Maybe you want to consider including sumskipnan in
> standard octave.
>
I repeated your experiment using current Octave tip (-O3
-march=native, Core 2 Duo @ 2.83GHz):
mean(x,1) mean(x,2) std(x,0,1) std(x,0,2) var(x,0,1)
var(x,0,2) meansq(x,1) meansq(x,2) sum(skipnan)(x,1)
sum(skipnan)(x,2)
tic-toc time
0.108911 0.132629 0.114568 0.163950 0.112384 0.163973
0.112379 0.163682 0.096581 0.101545
0.090389 0.091657 0.915853 0.955799 0.883821 0.921007
0.110276 0.114233 0.082247 0.089742
tic-toc ratio
0.82993 0.69108 7.99397 5.82982 7.86431 5.61683 0.98129
0.69790 0.85159 0.88376
cputime
0.108007 0.136008 0.112007 0.164011 0.112007 0.164010
0.116007 0.160010 0.100006 0.100007
0.088005 0.088005 0.900056 0.956060 0.884055 0.924058
0.092006 0.116007 0.080005 0.092006
cputime ratio
0.81481 0.64706 8.03571 5.82924 7.89285 5.63416 0.79311
0.72500 0.80000 0.92000
It can be seen that the penalty for skipping NaNs is mostly within
20-30%, smaller for column-oriented reductions.
The speed-up factors 5 and 7 for std and var are caused by the
single-sweep computation done in sumskipnan.
This becomes apparent when a less random data are supplied, and the
NaN toolbox reverts to a backup algorithm (which is what Octave always
does) - relative error at the order of 10^-4:
tic-toc time
0.108613 0.132721 1.362765 1.500724 1.366353 1.499243
0.115758 0.163625 0.097873 0.102086
0.089788 0.089979 0.876386 0.914380 0.880742 0.913636
0.094084 0.091950 0.082200 0.089619
tic-toc ratio
0.82668 0.67796 0.64309 0.60929 0.64459 0.60940 0.81277
0.56196 0.83986 0.87788
cputime
0.108007 0.132008 1.364085 1.500094 1.368086 1.500093
0.116007 0.164011 0.096006 0.104006
0.092006 0.088005 0.876055 0.916057 0.880055 0.916057
0.092006 0.092006 0.084005 0.088005
cputime ratio
0.85185 0.66666 0.64223 0.61067 0.64327 0.61067 0.79311
0.56097 0.87500 0.84615
Here the std/var computations are slown down by some 35-45%. This is
less favorable, though certainly no disaster.
I think the Octave statistics subcommunity should discuss what would
they appreciate best. Is anyone depending on the speed of std/var?
Opinions about skipping NaNs? Given Octave's NA support, it may be
better to just skip NAs, like R does.
There were also suggestions to move the statistics functions
completely out of Octave. Personally, I'd vote to retain just the
stuff from statistics/base, because I sometimes use functions thereof
despite not being a statistician.
regards
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
n = 8e3;
randn("state", 123);
#x = randn(n);
x = 1 + randn(n) * 1e-4;
#k=1;
k=2;
load data
t=cputime();tic; m = mean(x,1); T(k,1)=toc;V(k,1)=cputime()-t;
t=cputime();tic; m = mean(x,2); T(k,2)=toc;V(k,2)=cputime()-t;
t=cputime();tic; m = std(x,0,1); T(k,3)=toc;V(k,3)=cputime()-t;
t=cputime();tic; m = std(x,0,2); T(k,4)=toc;V(k,4)=cputime()-t;
t=cputime();tic; m = var(x,0,1); T(k,5)=toc;V(k,5)=cputime()-t;
t=cputime();tic; m = var(x,0,2); T(k,6)=toc;V(k,6)=cputime()-t;
t=cputime();tic; m = meansq(x,1); T(k,7)=toc;V(k,7)=cputime()-t;
t=cputime();tic; m = meansq(x,2); T(k,8)=toc;V(k,8)=cputime()-t;
if (k == 1)
t=cputime();tic; m = sumskipnan(x,1); T(k,9)=toc;V(k,9)=cputime()-t;
t=cputime();tic; m = sumskipnan(x,2); T(k,10)=toc;V(k,10)=cputime()-t;
else
t=cputime();tic; m = sum(x,1); T(k,9)=toc;V(k,9)=cputime()-t;
t=cputime();tic; m = sum(x,2); T(k,10)=toc;V(k,10)=cputime()-t;
endif
save data T V
