[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: -ffast-math option at compling octave in FreeBSD ports ?
From: |
Jaroslav Hajek |
Subject: |
Re: -ffast-math option at compling octave in FreeBSD ports ? |
Date: |
Sun, 7 Dec 2008 16:58:36 +0100 |
On Sun, Dec 7, 2008 at 9:12 AM, Tatsuro MATSUOKA <address@hidden> wrote:
> Hello
>
> In an octave thread in Japan, there was a report that asked the meaning
> -ffast-math option in FreeBSD ports.
>
> It will be glad for me if there are some peple who will give me information
> about it.
>
> Regards
>
> Tatsuro
>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/
> _______________________________________________
> Help-octave mailing list
> address@hidden
> https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
>
I'm not exactly an expert, but I'll try to explain:
-ffast-math in GCC enables certain optimizations that can dramatically
boost performance, but may slightly violate the expected semantics of
a computation.
To get an idea what is allowed under -ffast-math, try this simple
function with g++:
void dscal (double *x, int n, double a)
{
for (int i = 0; i < n; i++)
x[i] /= a;
}
compiled (to assembler) using "-O3 -fomit-frame-pointer"
(I intentionally omit -funroll-loops so that the assembler stays readable)
I get (g++ 4.3, old Intel Celeron):
movl 8(%esp), %ecx
movl 4(%esp), %edx
fldl 12(%esp)
testl %ecx, %ecx
jle .L8
xorl %eax, %eax
.p2align 4,,7
.L4:
fldl (%edx,%eax,8)
fdiv %st(1), %st
fstpl (%edx,%eax,8)
addl $1, %eax
cmpl %ecx, %eax
jne .L4
.L8:
fstp %st(0)
ret
whereas with "-O3 -fomit-frame-pointer -ffast-math" I get:
movl 8(%esp), %ecx
movl 4(%esp), %edx
fldl 12(%esp)
testl %ecx, %ecx
jle .L8
fld1
xorl %eax, %eax
fdivp %st, %st(1)
.p2align 4,,7
.L4:
fldl (%edx,%eax,8)
fmul %st(1), %st
fstpl (%edx,%eax,8)
addl $1, %eax
cmpl %ecx, %eax
jne .L4
.L8:
fstp %st(0)
ret
If you can read assembler at the basic level (like I do), you see that
in the second case, the compiler essentially transformed the function
like this:
void dscal (double *x, int n, double a)
{
double ainv = 1.0/a;
for (int i = 0; i < n; i++)
x[i] *= ainv;
}
This is much faster, because division is much slower than
multiplication, and can also be better vectorized using SSE
instructions and loop unrolling.
However, it may produce slightly different results, because, for instance, while
x / x is exactly 1 for any finite nonzero x, x * (1/x) is not (in FP math).
Another thing is that with -ffast-math, compiler is allowed to assume
that NaNs and Infs do not occur in expressions, and thus, for
instance, replace "x-x" by 0. (which does not hold for x=NaN).
HTH,
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz