[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A faster sum
From: |
Schloegl Alois |
Subject: |
Re: A faster sum |
Date: |
Sat, 04 Jun 2005 22:51:46 +0200 |
User-agent: |
Internet Messaging Program (IMP) 4.1-cvs |
There is a simple explanation for why sum is currently slow. It's
related to the fact that it needs to work across any dimentsion for
N-d arrays. Probably the current code for that is not so good. We
can probably optimize a few simple cases. Or maybe someone will have
a smarter way to do what we need, perhaps by doing something more
intelligent with strides in a single loop or something. I don't have
time to think about how to fix the problem just now, but the code to
look at is in the macro MX_ND_REDUCTION in liboctave/mx-inlines.cc.
jwe
I was looking into this in more detail. When running the following
script using SUMSKIPNAN.OCT compiled from
http://cvs.sourceforge.net/viewcvs.py/*checkout*/octave/octave-forge/extra/NaN/sumskipnan.cc
f = { 'sum(X)', 'sum(X,1)', 'sum(X,2)', 'sumskipnan(X)',
'sumskipnan(X,1)', 'sumskipnan(X,2)','ones(1,n)*X','X*ones(n,1)'};
K =repmat(NaN,length(f),16);
for N=[12],%0:13];
n = ceil(2^N*1.4);
X = rand(n);
for k = 1:length(f),
t = cputime;
s = eval(f{k});
K(k,N+1) = cputime-t;
fprintf(1,'%2i %2i\t%15s\t%5.2f\n',k,n,f{k},K(k,N+1)/K(1,N+1));
end;
end;
one gets this result. 1 5735 sum(X) 1.00
2 5735 sum(X,1) 1.00
3 5735 sum(X,2) 6.33
4 5735 sumskipnan(X) 1.11
5 5735 sumskipnan(X,1) 1.11
6 5735 sumskipnan(X,2) 5.17
7 5735 ones(1,n)*X 0.25
8 5735 X*ones(n,1) 0.70
SUMKSKIPNAN.OCT does not use the macro MX_ND_REDUCTION. Despite, it
shows that slowdown for DIM=2; A similar effect is shown in Matlab.
Here are the results: 1 5735 sum(X) 1.00
2 5735 sum(X,1) 0.98
3 5735 sum(X,2) 12.68
4 5735 sumskipnan(X) 1.52
5 5735 sumskipnan(X,1) 1.52
6 5735 sumskipnan(X,2) 14.62
7 5735 ones(1,n)*X 0.56
8 5735 X*ones(n,1) 0.64
We can also see that the matrix multiplication is much faster for
DIM=2. Hence, the slowdown can not only be explained only by an
inefficent implementation of SUM or SUMSKIPNAN. This suggests that the
slow down is caused mainly by the cache lines. Using ATLAS/BLAS
implementing SUM could provide a significant performance gain. However,
I'm still wondering how SUMSKIPNAN could be implemented efficiently
using ATLAS or BLAS. Any ideas ?
Alois
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: A faster sum,
Schloegl Alois <=