[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers nee
From: |
Oliver Jennrich |
Subject: |
Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL... |
Date: |
Mon, 6 Aug 2007 16:29:51 +0200 |
On 7/27/07, Gordan Bobic <address@hidden> wrote:
> On Fri, 27 Jul 2007, Jochen Küpper wrote:
>
> >> [...example..]
> >
> >> Using floats instead of doubles can lead to quite significant performance
> >> differences.
> >
> > On you Pentium 3, not the average number cruncher these days.
> > A Opteron or any of the modern Intel CPUs would be more appropriate.
>
> *sigh*
>
> On an x86-64 Core2/1.9GHz, CentOS/x86-64 v5, ICC v9.1.051/x86-64
> Using the small sample program I posted earlier.
> Compiled with: icc -msse3 -xP -fp-model fast=2
>
> Using floats: 2.65 seconds
> Using doubles: 5.29 seconds
>
> Twice as many floats vectorize per operation as doubles. Thus it goes
> twice as fast. How much more evidence do you require?
No you guys got me interested.
Here is what I tried:
#include <stdio.h>
#include <math.h>
int main ()
{
const float foo = 29.123;
unsigned int j,k;
unsigned int i;
double a[] = {1,2,3,4,5,6,7,8};
double b[] = {5,6,7,8,9,10,11,12};
double c[] = {0,0,0,0,0,0,0,0};
for (k=0;k<100000;k++){
for (j=0;j<10000;j++){
for (i = 0; i < 8; i++)
{
c[ i ] = (j*k*(a[ i ]+b[ i ]));
}
}
}
printf("%f", c[3]);
return 0;
}
with gcc 4.1.1
gcc -O3 -march=pentium-m -malign-double -mfpmath=sse -msse2 -Wall -o
vect vect.c -ftree-vectorize -ftree-vectorizer-verbose=5
on a
x86 Family 6 Model 13 Stepping 8 GenuineIntel ~1862 Mhz
The multiplication with j and k ist just so that -O3 doesn't optimize
the outer loops to oblivion, and to raise the overall times above the
clock noise
The results are puzzling:
double, no vectorization: 23.797s
double vectorization: 23.858s
float, no vec: 15.561s
float, vec: 5.843s
long double, no vec (as sse2 is not enough...): 33.344s
Ok, I do understand why long double is slower than double (I think).
But why does vectorization not make the slightest bit of difference
when using doubles?
> Where does this whole nonsense of "doubles are as fast as floats" come
> from? I know it's taught everywhere these days, but it is absolutely, not
> true. Whoever first came up with it has a lot to answer for.
Indeed.
--
Space -- the final frontier