[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: indexing expression performance

From: Nicholas Tung
Subject: Re: indexing expression performance
Date: Tue, 13 Jan 2009 23:00:06 -0500

On Tue, Jan 13, 2009 at 16:46, Francesco Potortì <address@hidden> wrote:
>I've noticed that indexing expressions on large arrays can be quite slow,

Have a look at this, which I wrote in November but for some reason does
not appear in the list archives.  In an application of mine, I wrote an
ad hoc function to access a 5-dim matrix using the last method below,
getting a significant speedup.  I usually run 3.0.1, and I did not make
any check to compare its speed with the more recent Octave versions.

|Date: 11 Nov 2008 10:44:45 +0100
|From: Francesco Potortì <address@hidden>
|To: Jaroslav Hajek <address@hidden>
|CC: Octave help list <address@hidden>
|Subject: Re: slow access to consecutive matrix elements
|>On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
|><address@hidden> wrote:
|>> This is a real life example that demonstrates how access to matrix
|>> elements is not optimised for the case of complete ranges in the first
|>> dimensions.  I am sending it here and not to the bug list because this
|>> example may be useful to others, at least until this cases is optimised
|>> in the sources.
|>> # This is a big 5-dim matrix, about 100MB size
|>> octave> kk=rand(156,222,1,44,8);
|>> # I access a slice of it, which is sequential in memory
|>> octave> t=cputime; for ii=1:44, for jj=1:8
|>>  mm=kk(:,:,:,ii,jj); endfor, endfor, cputime-t
|>> ans =  5.9124
|>> # Using a linear index is much faster
|>> octave> span=(1:156*222);
|>>        t=cputime; for ii=1:44, for jj=1:8
|>>  mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span-1); endfor, endfor, cputime-t
|>> ans =  2.6642
|>> # Removing the call to sub2ind reaches top speed
|>> octave> cp=[1,cumprod(size(kk)(1:end-1))]; span=(1:156*222);
|>>        t=cputime; for ii=1:44, for jj=1:8
|>>  mm=kk(sum(([1,1,1,ii,jj]-1).*cp)+span); endfor, endfor, cputime-t
|>> ans =  1.4001
|>> Still, I wish access were faster yet.  Is there a reason why copying
|>> consecutive memory is so slow?  I wish I could help with optimising
|>> this, even if I am certainly not the most qualified person to do it.
|>I guess the indexing routines do deserve some attention w.r.t
|>performance. Reducing code duplication would also be nice. I have this
|>on my TODO list, but I don't think it's a good idea to do it before
|>3.2.x is forked, as such changes are, IMO, likely to introduce bugs.
|Follwoing up on this, I realised that there is room for further
|significant speedup:
|# Be careful to not convert ranges into matrices
|octave> cp=[1,cumprod(size(kk)(1:end-1))]; len=156*222;
|      t=cputime; for ii=1:44, for jj=1:8
|base=sum(([1,1,1,ii,jj]-1).*cp); mm=kk(base+1:base+len); endfor, endfor, cputime-t
|ans =  0.26402
|The fact is, I was discounting the fact that a range remains a range
|even after linear transformation, while this does not appear to be the
|octave3.1> typeinfo(1:4)
|ans = range
|octave3.1> typeinfo(4+(1:4))
|ans = matrix
|octave3.1> typeinfo(4*(1:4))
|ans = matrix
|>From the depth of my ignorance about Octave's internals, I dare say that
|it should not be too difficult to keep ranges as ranges even after sum
|or product with a scalar.  Maybe even after sum with a range with the
|same number of elements.  Am I wrong?

neat, I didn't know that 1-d vector indexing worked on matrices. However, I think it may be faster for you because of the multi-dimensional aspect, whereas I am working with more values (potentially large image samples). On Octave 3.1.51+ (built today) the following code runs in about the same time. If you're doing something more complicated, please tell me. I know the two snippets below don't compute exactly the same value...

tic, x1 = 1:299; x2 = 2:300; for i = 1:100, a(:, x1) += a(:, x2); endfor, toc
x1 = 1:89700; x2 = 301:90000; tic, for i = 1:100, a(x1) += a(x2); endfor, toc

As another note, it doesn't seem to matter that much which way the matrix is indexed for the first, i.e. a(x1, :) vs a(:, x1).

Jaroslav - I updated to hg head and it was faster (cut time in half for the test example), but on real code not quite as much as I'd hoped.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]