octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reduction funs optimizations + min/max question


From: Jaroslav Hajek
Subject: Re: reduction funs optimizations + min/max question
Date: Mon, 16 Feb 2009 14:28:58 +0100

On Mon, Feb 16, 2009 at 9:23 AM, Jaroslav Hajek <address@hidden> wrote:
> On Fri, Feb 13, 2009 at 9:22 PM, Jaroslav Hajek <address@hidden> wrote:
>> hi,
>>
>> this changeset: http://hg.savannah.gnu.org/hgweb/octave/rev/53b4fdeacc2e
>> reimplements the reduction and cumulative reduction "cores" sum, prod,
>> sumsq, cumsum and cumprod
>> for better performance.
>>
>
>
> Following similar ideas, I optimized also min/max reductions and
> any/all. A simplistic benchmark follows, as usual
> (set N to suitable value).
>
> n = 5e3;
> a = rand (n);
> tic; b = max (a); toc
> tic; b = min (a); toc
> tic; b = max (a, [], 2); toc
> tic; b = min (a, [], 2); toc
> tic; [b, i] = max (a); toc
> tic; [b, i] = min (a); toc
> tic; [b, i] = max (a, [], 2); toc
> tic; [b, i] = min (a, [], 2); toc
>
> tiny = a < 1e-2;
> huge = ! tiny;
> clear a;
>
> tic; any (tiny); toc
> tic; any (tiny, 2); toc
> tic; all (huge); toc
> tic; all (huge, 2); toc
>
> with a recent tip, I get:
>
> Elapsed time is 0.195431 seconds.
> Elapsed time is 0.19919 seconds.
> Elapsed time is 0.335377 seconds.
> Elapsed time is 0.353903 seconds.
> Elapsed time is 0.194535 seconds.
> Elapsed time is 0.199214 seconds.
> Elapsed time is 0.333951 seconds.
> Elapsed time is 0.353886 seconds.
> Elapsed time is 0.242775 seconds.
> Elapsed time is 0.244407 seconds.
> Elapsed time is 0.149176 seconds.
> Elapsed time is 0.150418 seconds.
>
> with the new patches, I get:
>
> Elapsed time is 0.0466709 seconds.
> Elapsed time is 0.0463569 seconds.
> Elapsed time is 0.0438602 seconds.
> Elapsed time is 0.0425069 seconds.
> Elapsed time is 0.0478208 seconds.
> Elapsed time is 0.0480652 seconds.
> Elapsed time is 0.03636 seconds.
> Elapsed time is 0.0462041 seconds.
> Elapsed time is 0.0008111 seconds.
> Elapsed time is 0.0244672 seconds.
> Elapsed time is 0.000770807 seconds.
> Elapsed time is 0.0245111 seconds.
>
> and the relative speed-ups (the usual definition):
>
>  319%
>  330%
>  665%
>  733%
>  307%
>  314%
>  818%
>  666%
> 29832%
>  899%
> 19253%
>  514%
>
> One more note: as you can notice, the column-oriented any/all are much
> faster than row-oriented.
> That's because the column-oriented versions use short-circuiting while
> the row-oriented do not.
> In the row-reduction any/all case, there is a trade-off between
> working by columns in a cache-coherent manner
> (that's what the current version does) and sacrificing
> short-circuiting or working by rows to get short-circuiting and
> sacrifice cache-coherency.
>

Update: as this was an interesting challenge I went ahead and
implemented an algorithm that achieves both cache-coherent access and
short-circuiting.
So with the newest patch, I get for the last four benchmarks:

Elapsed time is 0.000790119 seconds.
Elapsed time is 0.000867844 seconds.
Elapsed time is 0.00072813 seconds.
Elapsed time is 0.000805855 seconds.

i.e. row-oriented any/all reductions are now essentially as fast as
the column reductions.
I conclude that in general the reductions will now be plausibly more
efficient than they used to be.

cheers

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]