help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Same .m file: different results with different versions of Octave


From: Judd Storrs
Subject: Re: Same .m file: different results with different versions of Octave
Date: Tue, 20 Apr 2010 03:18:05 -0400

On Tue, Apr 20, 2010 at 1:55 AM, Jaroslav Hajek <address@hidden> wrote:
> I think we're getting closer. I started from there and made a few more
> changes that I hope should not disrupt accuracy. I removed __pow calls
> for squaring in favor of simple multiplication, and decomposed the
> sin(2*imag(x)) part into sin(imag(x)) * cos(imag(x)), precomputing all
> common subexpression. Based on this I reduced the computation to one
> of each sinh, cosh and the combined sincos call, which is the same as
> the original implementation did (but we do it for different values).

Aha! I was tearing my hair out trying to figure out how to get it
faster and was stuck. The patch I sent to the list turned out to be
about twice as slow as glibc. I made it as far as removing the pow()
and undoing the sinh(x)*cish(x) replacement for sinh(2x) an was able
to get to 30% slower but I was stuck. Replacing sin(2x) with
sin(x)*cos(x) is brilliant.

> I also included a check for the den == 0 case. Although in reality
> perhaps cos() can never be zero (at least I failed to produce an exact
> root), I think it is not guaranteed.

I'm still pondering this one. As far as I can tell, because

den == 2*(sinh(A)^2 + cos(B)^2)

For A and B real, sinh(A)^2 can only be zero when A==0. cos(B) can
only be zero for B=(n+1/2)pi. For den to be zero, both must must be
zero simultaneously. I think tanh has to be NaN+NaNi at those places
because both the real and imaginary parts limit to +inf and -inf
depending on path. The reason I hesitate is that the glibc author
thought something could be done there... I feel like maybe I'm missing
something obvious.

> I didn't yet measure the performance impact, but I'm pretty confident
> it will be negligible, given that we actually only do a few extra
> multiplications and an extra check.

I've been testing the performance with

x = -20:0.05:20 ;
[a,b] = ndgrid(x,x) ;
x = complex(a,b) ;
clear a b
for i=1:10
   tic ; tanh(x) ; toc
endfor

For the unaltered glibc I get:

Elapsed time is 0.150706 seconds.
Elapsed time is 0.153792 seconds.
Elapsed time is 0.151361 seconds.
Elapsed time is 0.153138 seconds.
Elapsed time is 0.14215 seconds.
Elapsed time is 0.141334 seconds.
Elapsed time is 0.139582 seconds.
Elapsed time is 0.141947 seconds.
Elapsed time is 0.139618 seconds.
Elapsed time is 0.139667 seconds.

Your optimized version does:

Elapsed time is 0.179921 seconds.
Elapsed time is 0.180975 seconds.
Elapsed time is 0.181409 seconds.
Elapsed time is 0.182956 seconds.
Elapsed time is 0.166545 seconds.
Elapsed time is 0.169406 seconds.
Elapsed time is 0.168018 seconds.
Elapsed time is 0.170243 seconds.
Elapsed time is 0.170556 seconds.
Elapsed time is 0.167231 seconds.

This is the unoptimized version:

Elapsed time is 0.281236 seconds.
Elapsed time is 0.284801 seconds.
Elapsed time is 0.270071 seconds.
Elapsed time is 0.268668 seconds.
Elapsed time is 0.259948 seconds.
Elapsed time is 0.256692 seconds.
Elapsed time is 0.256862 seconds.
Elapsed time is 0.261013 seconds.
Elapsed time is 0.258265 seconds.
Elapsed time is 0.256966 seconds.

Of course, it's "cheating" if we go beyond to x>22.
Or, maybe, it's "cheating" to limit ourselves to x<22 :)

x = -50:0.1:50 ;

Then glibc does

Elapsed time is 0.231459 seconds.
Elapsed time is 0.219057 seconds.
Elapsed time is 0.218268 seconds.
Elapsed time is 0.216653 seconds.
Elapsed time is 0.20223 seconds.
Elapsed time is 0.207265 seconds.
Elapsed time is 0.20144 seconds.
Elapsed time is 0.201405 seconds.
Elapsed time is 0.201688 seconds.
Elapsed time is 0.202221 seconds.

Your optimizations do:

Elapsed time is 0.16741 seconds.
Elapsed time is 0.193478 seconds.
Elapsed time is 0.184134 seconds.
Elapsed time is 0.188431 seconds.
Elapsed time is 0.173057 seconds.
Elapsed time is 0.172517 seconds.
Elapsed time is 0.172727 seconds.
Elapsed time is 0.172515 seconds.
Elapsed time is 0.161827 seconds.
Elapsed time is 0.155982 seconds.


> Unless you see any problems with this change, I'd say let's find out
> how to do the thresholds for single and long double precision (esp.
> the latter), complete the patch and submit it to glibc maintainers.

I don't see any problems except for maybe the den==0 case. I'll look
into the single and long double parts later today and verify that the
C99 requirements for +inf/+nan inputs are met. The thing about long
double is I think it's not standardized. Maybe we can assume at least
80 bits... I'll see if the rest of glibc gives any hints about that.

Did I mention that I really like your solution to the optimization :)


Cheers,

--judd


reply via email to

[Prev in Thread] Current Thread [Next in Thread]