octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: new accumarray optimizations


From: Jaroslav Hajek
Subject: Re: FYI: new accumarray optimizations
Date: Mon, 8 Feb 2010 13:13:36 +0100

On Fri, Feb 5, 2010 at 10:19 PM, Jaroslav Hajek <address@hidden> wrote:
> hi all,
>
> by the following changeset
> http://hg.savannah.gnu.org/hgweb/octave/rev/9a16a61ed43d
>
> I implemented new optimizations for accumarray.
> @sum, @min, and @max reductions are now handled by a fast O(N)
> algorithm (in the dense case).
> I also refactored the code somewhat for better general efficiency and
> optimized also the @(x) {x} reduction (i.e. just collecting the values
> into cells).
> The following code illustrates the speed-ups.
>
> n = 1000000;
> m = 1000;
> idx = ceil (m*rand (n, 1));
> val = rand (n, 1);
> try
>  accumarray; # force parse
> catch
> end_try_catch
>
> tic; accumarray (idx, val, [m, 1]); toc
> tic; accumarray (idx, val, [m, 1], @min, Inf); toc
> tic; accumarray (idx, val, [m, 1], @max); toc
> tic; accumarray (idx, val, [m, 1], @(x){x}); toc
> tic; accumarray (idx, val, [m, 1], @std); toc
>
> m = 20;
> idx = ceil (m*rand (n, 3));
> val = rand (n, 1);
>
> tic; accumarray (idx, val, [m, m, m]); toc
> tic; accumarray (idx, val, [m, m, m], @min, Inf); toc
> tic; accumarray (idx, val, [m, m, m], @max); toc
> tic; accumarray (idx, val, [m, m, m], @(x){x}); toc
> tic; accumarray (idx, val, [m, m, m], @std); toc
>
> On Core 2 Duo @ 1.83 GHz, gcc 4.3 -O3 -march=native,
> with a recent tip I get:
>
> Elapsed time is 0.011374 seconds.
> Elapsed time is 0.456557 seconds.
> Elapsed time is 0.390821 seconds.
> Elapsed time is 0.456975 seconds.
> Elapsed time is 0.597545 seconds.
> Elapsed time is 0.0489521 seconds.
> Elapsed time is 0.859555 seconds.
> Elapsed time is 0.874383 seconds.
> Elapsed time is 1.36757 seconds.
> Elapsed time is 2.39027 seconds.
>
> with the new patch I get
>
> Elapsed time is 0.0110469 seconds.
> Elapsed time is 0.00914407 seconds.
> Elapsed time is 0.00952911 seconds.
> Elapsed time is 0.336586 seconds.
> Elapsed time is 0.560295 seconds.
> Elapsed time is 0.0400701 seconds.
> Elapsed time is 0.040818 seconds.
> Elapsed time is 0.0429821 seconds.
> Elapsed time is 0.386366 seconds.
> Elapsed time is 2.09485 seconds.
>
> enjoy!
>
> P.S. trying just this:
>
> m = 100;
> idx = ceil (m*rand (n, 2));
> val = rand (n, 1);
>
> tic; accumarray (idx, val, [m, m]); toc
> tic; accumarray (idx, val, [m, m], [], 0, true); toc
>
> gives
>
> Elapsed time is 0.0295751 seconds.
> Elapsed time is 0.568987 seconds.
>
> and shows that the sparse summation significantly lags behind, so I'll
> target that next.
>


Just a follow-up:
this patch
http://hg.savannah.gnu.org/hgweb/octave/rev/3a8c13b71612
implements a special optimization for sorting of matrices known to be
valid index vectors, using the internal index cache mechanism. The
trick is that sorting an index vector of length N and maximum M can be
done efficiently in O(M+N) in both time and memory instead of
O(N*log(N)) time and O(N) memory (using a two-pass bucket sort). Since
index vector is supposed to be meant for indexing and so O(M) memory
is supposed to be taken somewhere, it seems OK to choose the O(M+N)
strategy if
M < N*log(N).

The result may be demonstrated by this simple script:

## generate two million random indices in 1:1000
a = ceil (1000 * rand (1, 2e6));
sort (1); # Avoid lookup overhead.
tic; sort (a); toc
tic; [b, i] = sort (a); toc
isindex (a); # force cache to be created.
tic; sort (a); toc
tic; [b, i] = sort (a); toc

when run with the new patch, I get:

Elapsed time is 0.256435 seconds.
Elapsed time is 0.37301 seconds.
Elapsed time is 0.0182481 seconds.
Elapsed time is 0.0765049 seconds.

i.e. a significant ~5x speed-up (the indexed sort is more important).
This can be used to further speed-up the general case of accumarray.
Using the same benchmark as before, prior to this patch I get (this
time the computer is different):

Elapsed time is 0.00733113 seconds.
Elapsed time is 0.0142481 seconds.
Elapsed time is 0.00615501 seconds.
Elapsed time is 0.200373 seconds.
Elapsed time is 0.354519 seconds.
Elapsed time is 0.0296841 seconds.
Elapsed time is 0.028862 seconds.
Elapsed time is 0.0297501 seconds.
Elapsed time is 0.244352 seconds.
Elapsed time is 1.27717 seconds.

and with the new patch I get

Elapsed time is 0.00732803 seconds.
Elapsed time is 0.00574303 seconds.
Elapsed time is 0.00605202 seconds.
Elapsed time is 0.057658 seconds.
Elapsed time is 0.205616 seconds.
Elapsed time is 0.02685 seconds.
Elapsed time is 0.0262721 seconds.
Elapsed time is 0.0273471 seconds.
Elapsed time is 0.0811741 seconds.
Elapsed time is 1.22044 seconds.

so in the simple collection case (func = @(x){x}) there is a 3-4x
speed-up, in the general case it's just about 5% as the reduction and
interpreter overhead apparently already play a major role. Amdahl's
law in action :)
For comparison, here's 3.2.4 with the same benchmark:

Elapsed time is 0.0072608 seconds.
Elapsed time is 0.262903 seconds.
Elapsed time is 0.250486 seconds.
Elapsed time is 0.307305 seconds.
Elapsed time is 0.508496 seconds.
Elapsed time is 0.236321 seconds.
Elapsed time is 0.574856 seconds.
Elapsed time is 0.581962 seconds.
Elapsed time is 1.02948 seconds.
Elapsed time is 2.0348 seconds.

I can conclude that accumarray is now significantly faster.

Just for fun, here's the result with Matlab 2007a:

Elapsed time is 0.013596 seconds.
Elapsed time is 0.344220 seconds.
Elapsed time is 0.337973 seconds.
Elapsed time is 0.339522 seconds.
Elapsed time is 0.814698 seconds.
Elapsed time is 0.021939 seconds.
Elapsed time is 0.368590 seconds.
Elapsed time is 0.362186 seconds.
Elapsed time is 0.409095 seconds.
Elapsed time is 0.628526 seconds.

it seems that just like Octave Matlab optimizes the @sum case but
doesn't do the rest of Octave's optimizations. The 5th and 10th
results however show that the interpreter overhead is significantly
smaller in Matlab. I'm not sure whether this version already does any
JIT compiling; if so, that's an explanation.

regards

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]