[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-gsl] P4 optimisations... an update
From: |
Sam Halliday |
Subject: |
[Help-gsl] P4 optimisations... an update |
Date: |
Mon, 23 Feb 2004 16:28:00 +0000 |
hi there,
a year ago, i wrote these emails to the list:
http://sources.redhat.com/ml/gsl-discuss/2002-q4/msg00197.html
http://sources.redhat.com/ml/gsl-discuss/2003-q1/msg00021.html
which, at the time, gave relevant info about compiling GSL with GCC
using P4 optimisations.
i recently decided to redo my benchmarks with the gcc-3.3.3 release on a
2GHz P4 running Debian GNU/Linux... and things have changed to the
extent where i retract my recommendation. as a simple check everyone can
do, you can see how long the `make check` takes to run. but this has
more worth as an integrity check to see if everything works: none of the
programs run for long enough to give a decent statistical benchmark.
no opts:
./configure
make
make check # need to do this twice so the benchmark
# does not include compilation times
time make check > CHECK.noopts
grep -i fail CHECK.noopts
p4 opts (on a clean tarball)
export CFLAGS="-O2 -march=pentium4 -mfpmath=sse -msse -msse2"
./configure
make
make check
time make check > CHECK.p4opts
grep -i fail CHECK.p4opts
nowadays, they both run in the same length of time (1m30s on a 2GHz P4).
the gcc team must have done something.
my own numerical code runs at
mycode : noopts p4opts
GSL noopts: 25s 20s
GSL p4opts: 25s 20s
it used to be a difference of about 6 or 7 seconds depending on GSL
optimisation (hence my "up to 33% faster" claim) now, i get most
speedups from gcc optimisation of my code and not GSL's (which is really
weird, since this code only calls GSL). this is just an example, i have
several other programs which exhibit the same optimisation results.
i guess things have changed in the last year. i was on an LFS system
when i did those benchmarks with gcc-3.2.1, now i'm on Debian GNU/Linux
with gcc-3.3.3. but i doubt that the distro makes much of a difference
to this benchmark.
also, at the time i remember my code seemed to produce false results for
particular sets of input parameters. i could not attribute this to any
fault in my code, and using SSE optimisations seemed to fix it. when i
ran the code just now, i did not notice this behaviour at all (with or
without P4 opts)... again, the gcc team must have changed something in
the last few releases. i must have found a bug in gcc's 387 code (the
floating point chip) which has now been fixed.
anyway, hope this info helps keep the docs up to date. it certainly
makes you wonder why intel ever made the SSE chipsets now, doesn't it?
now that the gcc SSE codebase has stabilised, we can see that the
chipsets don't even give a substantial performance increase for
numerical code which uses purely floating arithmetic :-/
cheers,
Sam
--
Free High School Science Texts
http://savannah.nongnu.org/projects/fhsst
Sam's Homepages
http://fommil.homeunix.org/~samuel
http://www.ma.hw.ac.uk/~samuel
pgpUeVOAZCKeh.pgp
Description: PGP signature
- [Help-gsl] P4 optimisations... an update,
Sam Halliday <=