octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slo


From: Rik
Subject: [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!
Date: Sun, 21 Nov 2010 22:12:40 +0000
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Ubuntu/9.10 (karmic) Firefox/3.6.12

Follow-up Comment #7, bug #31709 (project octave):

>>I am afraid you did not read my submission

Please do not tell me what I did or did not do.  I know exactly what I did. 
Your code has some typical issues which hinder speed which I identified.  Only
after those are corrected, and the speed does not improve, can one can start
looking at whether the Octave core needs to be changed.  

>>iii) pre-allocation will not help here either.

This is a scientific/engineering question rather than one of opinion.  Don't
tell me that pre-allocation won't help, just demonstrate that it doesn't with
some code.  I will happily accept that sort of evidence.

Profiling isn't all that hard, it just involves putting tic/toc combinations
around various blocks and narrowing things down.  On my machine I find the
following times for the slow calculation:

Outer While Loop : 14.55 seconds
Inner While Loop : 1.093

So, somewhat surprisingly, it is not the tightest inner loop.  Keep
profiling.

le, lens block : 0.9601
cy assignments : 12.311

Okay, now the hot spot is clear.  85% of the execution time is being spent in
just two cy assignments.  For optimization purposes I will time both of them.

cy assignment #1 : 6.600 
cy assignment #2 : 5.978

cy assignment #2 is

cy(i,N^3+1)=le;
why not keep cy square (N^3,N^3) and use a single row vector to keep track of
le?  I added/changed
cy=sparse(N^3,N^3);
cyle = sparse (N^3,1);
and assignment #2 is now
cyle(i,1) = le;


cy assignment #2 w/cyle : 1.384

>From ~6 seconds to 1.4 seconds or a little over 400% speedup.  But I notice
that cyle always uses 100% of it's entries.  Why not pre-allocate the storage
for the sparse array since it will be used?  Second change:

cyle = spalloc (N^3, 1, N^3);


cy assignment #2 w/cyle & spalloc : 0.3957
cy assignment #2 w/cyle & full matrix : 0.3461

This speeds things up another 3X.  Since 100% of the cyle entries were filled
I tried using a regular full matrix rather than a sparse matrix.  As you can
see, this change only nets about 10% improvement.

With the improvements above as a guide I tried allocating storage for cy as
well.

cy=spalloc(N^3,N^3,N^3);


cy assignment #1 w/spalloc : 3.1131

This halves the time, but that still looks a little long.  Why not try
reversing the answer space and using column vectors instead of row vectors? 
The change in calculation can easily be undone by transposing the matrix cy at
the end of the run.

      
cy(1:le,i)=v';


cy assignment #1 w/spalloc & reversed row/cols : 0.82021

This is an 8x speed-up overall.  I agree that Octave really should treat
sparse column and row vectors equivalently and will file an optimization bug
about this on the bug tracker.  Nevertheless, you can easily code around it.

Finally, I went back and took a look at the original inner while loop.  With
all of the changes above incorporated the original benchmarking time has
changed.  This is not unusual.  Optimization involves repeatedly finding the
hotspot in the code, dealing with it, which in turn reveals another hotspot.

Inner While Loop : 0.7633

The while loop test condition checks the variable flag every single time
which is unnecessary.  If you want a loop body to always execute once, use a
do/until loop.  If you're programming in Matlab, which doesn't offer this
construct, then you can extract the body of the loop and execute it once
before starting the while loop.  Changed code:

v=i;
first = pp(i);
pp(i) = 0;
while (first ~=i)


Inner While Loop w/o flag : 0.37611
Outer While Loop after optimization : 1.79 seconds

Overall improvement was 14.55/1.79 or 8.1X.  Your 40 second code should take
about 5 seconds with these changes.  It also yielded the same answer for cy
compared to the original code.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31709>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]