[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
## Re: pb with -ffast-math

**From**: |
Alain Baeckeroot |

**Subject**: |
Re: pb with -ffast-math |

**Date**: |
Fri, 17 Apr 2009 09:04:47 +0200 |

**User-agent**: |
KMail/1.9.9 |

Le 17/04/2009 à 07:07, Jaroslav Hajek écrit :
>* *
>* On Thu, Apr 16, 2009 at 9:51 PM, Alain Baeckeroot*
>* <address@hidden> wrote:*
...
>* > Everythnig is vectorised, except one for-loop (iteration over time) which*
>* > takes 90% of time !*
>* > We are going to write this short part in C in a .oct file.*
>* >*
>* *
>* Could you post the relevant piece of code? Maybe there's a vectorized*
>* way you didn't see, or one that only works with development version,*
I hope so :-) . I'll try 3.1.55 (packaged in debian experimental)
>* or maybe it will be something worth a new function.*
I cannot send the code, but i'll write a similar example asap (in several
days).
Each line needs the result of the previous one.
There are no funtion call
Only arithmetic operations, and 5 tests (one max, one min, one >, one <)
and one 'if' in the begining.
The loop looks like :
N = 10 000;
X = zeros(N,1) ; (and T1....)
x0 ;
for k = 1:N
if ( k == 1 )
Y(k) = some arithmetic ( x0 ,y0, params);
else
Y(k) = some arithmetic ( X(k-1), Y(k-1) params )
endif
T1(k) = some arithmetic ( X (k-1), Y(k), params)
T2(k) = some arithmetic (T1 (k), X(k-1), Y(k), params)
T3(k) = (T2(k) > 0) * T2(k) + (T1(k) < 0) * T1(k)
T20(k) = max ( T19(k) * T19(k), T18(k)*T18(k) )
X(k) = arithmetic ( T20(k), Y(k) )
endfor
I putted some tic/toc inside the loop (i don't know how to profile
octave code), there is no single place taking all the time.
Very rough order of magnitude : the computation is done at one Mflops,
when we expect at least 10 Mflops on a core2duo.
(vectorised pre and post processing are approximately 40 time faster)
Regards.
Alain