|Subject:||RE: [Bug-gnubg] Benchmarks on server class machines and resulting change requests|
|Date:||Fri, 7 Aug 2009 04:23:56 +0200|
find attached the cleaned up benchmark data for both the 2xXeons 5130 and 2xNocona machines.
I've also done new research which now includes the impact of cache size, single threaded vs. multithreaded binary, and number of threads. The main result graph is attached, the data is in the same spreadsheed as the two other benchmarks (format OpenOffice 3.1) in 3 worksheet tabs.
The basis of the experiment were the same 5 different seven point FIBS matches used for the previous benchmarks. There were two binaries compiled, one with multihreading (GNUBGMT) and one without (GNUBGST). Both were compiled with gcc 220.127.116.11 on Debian 5.0.2, heavily optimized for core2 CPUs. SSE and SSE2 are used, code basis is gnubg.org CVS as per 2. August 2009. The hardware is a Supermicro 2xXeon 5130 machine with 6GB DDR2-5300 memory. The machine was completely idle during testing.
The 5 matches were analyzed 4 times each, resulting in a total 20 match evalutaions at 2ply/no pruning/cubeful. All caches were cleaned before each analysis. Cache size was varied from 2^1 to 2^27 bytes, resulting in 27 runs for each Graph.
* Graphs "Threads=1,2,3,4,5" are done with MT binary and the respektive settings for cache and threads, 20 matches
* Graph "No Threading" was done with GNUBGST, 20 matches
* Graph "4xNo threading á 1/4 work" was done by running 4 instances of GNUBGST with 5 matches to analyze each in parralel
* Graph "4xThreads=1 á 1/4 work" was done by running 4 instances of GNUBGMT set to use one thread, with 5 matches to analyze each in parralel
- The "spontaneois speedup" spikes seen especially for Threads=2 are oddd, i did several runs and they didn't disappear but showed in different frequency and cache size positions. I consider them bugs in the Unix time command.
- Data for Threads=6,7,8 was also collected but is not plotted, because as expected performance decreased with growing number of threads. Graph for Threads=5 shows that sufficiently, no need to clutter the diagram with more.
- The "4xThreads..." and "4x No Threading" runs aborted with out of memory for cachesize=2^26 and 2^27 (no suprise), thus no data for them.
I very much liked to hear some comments by you Jonathan (the author of the threading code). Happy with what you see? Well, I think you did a good job :)
Description: PNG image
|[Prev in Thread]||Current Thread||[Next in Thread]|