[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-gsl] P4 optimisations... an update

From: Sam Halliday
Subject: [Help-gsl] P4 optimisations... an update
Date: Mon, 23 Feb 2004 16:28:00 +0000

hi there,

a year ago, i wrote these emails to the list:
which, at the time, gave relevant info about compiling GSL with GCC
using P4 optimisations.

i recently decided to redo my benchmarks with the gcc-3.3.3 release on a
2GHz P4 running Debian GNU/Linux... and things have changed to the
extent where i retract my recommendation. as a simple check everyone can
do, you can see how long the `make check` takes to run. but this has
more worth as an integrity check to see if everything works: none of the
programs run for long enough to give a decent statistical benchmark.

no opts:
  make check   # need to do this twice so the benchmark
               # does not include compilation times
  time make check > CHECK.noopts
  grep -i fail CHECK.noopts

p4 opts (on a clean tarball)
  export CFLAGS="-O2 -march=pentium4 -mfpmath=sse -msse -msse2"
  make check
  time make check > CHECK.p4opts
  grep -i fail CHECK.p4opts

nowadays, they both run in the same length of time (1m30s on a 2GHz P4).
the gcc team must have done something.

my own numerical code runs at

mycode    : noopts p4opts
GSL noopts: 25s    20s
GSL p4opts: 25s    20s

it used to be a difference of about 6 or 7 seconds depending on GSL
optimisation (hence my "up to 33% faster" claim) now, i get most
speedups from gcc optimisation of my code and not GSL's (which is really
weird, since this code only calls GSL). this is just an example, i have
several other programs which exhibit the same optimisation results.

i guess things have changed in the last year. i was on an LFS system
when i did those benchmarks with gcc-3.2.1, now i'm on Debian GNU/Linux
with gcc-3.3.3. but i doubt that the distro makes much of a difference
to this benchmark.

also, at the time i remember my code seemed to produce false results for
particular sets of input parameters. i could not attribute this to any
fault in my code, and using SSE optimisations seemed to fix it. when i
ran the code just now, i did not notice this behaviour at all (with or
without P4 opts)... again, the gcc team must have changed something in
the last few releases. i must have found a bug in gcc's 387 code (the
floating point chip) which has now been fixed.

anyway, hope this info helps keep the docs up to date. it certainly
makes you wonder why intel ever made the SSE chipsets now, doesn't it?
now that the gcc SSE codebase has stabilised, we can see that the
chipsets don't even give a substantial performance increase for
numerical code which uses purely floating arithmetic :-/

Free High School Science Texts
Sam's Homepages

Attachment: pgpUeVOAZCKeh.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]