Jon,
the test was
conducted on machine A (2 x Core2 processor Xenon 5130 with 4 cores total)
using defaults. The machine was idle except for running the test and my SSH
connection. The batch eval script was this one (with number of threads
added by a small /bin/bash script. I will repeat the tests on the hyperthreaded
machine, but probably not for 64 threads but for 16. Stay
tuned.
Ingo
---snip---
file: batch ----
set lang
C set evaluation chequerplay evaluation plies 2 set evaluation chequerplay
evaluation cubeful on set evaluation chequerplay evaluation prune off set
evaluation cubedecision evaluation plies 2 set evaluation cubedecision
evaluation cubeful on set evaluation cubedecision evaluation prune
off
clear
cache clear hint analysis clear
import mat
You_vs_silent_greek_20090802162313580.mat analyze match [... and a lot
more matches, repeating the sequeence starting at "clear cache"
...]
---snip---
#!/bin/bash LIMIT=64 TMPBATCH=/tmp/gnubgbatch BATCH=./batch ((t
= 1))
while (( t
<= LIMIT )) do echo "threads = $t" echo >
${TMPBATCH} "set thread $t" cat ${BATCH} >>
${TMPBATCH} time ./gnubg < ${TMPBATCH} > /dev/null ((t
+= 1)) done
It's not
clear if you were using the hyper-threaded machine as this might explain
the jump form 1 to 2 cores and the smaller jump to 3 and 4 cores.
If
you were using machine "B", try running the test again for 1,2,3,4 threads
on machine "A". Make sure the cache size is set to
maximum.
Jon
Ingo Macherius wrote: > Christian, I've
conducted your suggested experiment (batch eval of saved matches) and can
confirm your answer. Calibrate ist not a suitable metric to evaluate threading
behaviour for gnubg. > > The batch experiment did analyze five
7pt matches for 4 times each, with full cache cleaning. The time was taken
with unix "time" command. The results are much more like what one would
expect: > - Speed peaked wheen the number of threads equaled the number
of cores > - Adding more threads than cores slowed down the evaluation
(albeit, by only a tiny nit) > - Speed decrease increased in the number
of threads > > The odd finding is that there still are some
anonalies, which are: > - Going from 1 to 2 threads more than doubles
the evaluation > - It has very little effect adding more threads, i.e.
the gain is not linear in # cores > - 2, 3 and 4 threads result in
speeds very close to each other, much closer than expected > >
I've attached a ZIP which contains the original OpenOffice 3.1 spreadsheet and
a PDF version of the graphs with the experiment details. > > Thx
a lot for your guidance! > > Ingo > >>
-----Original Message----- >> From: Christian Anthon
[mailto:address@hidden >> Sent: Monday, August 03, 2009
12:29 PM >> To: Ingo Macherius >> Cc:
address@hidden >> Subject: Re: [Bug-gnubg] Benchmarks on server
class machines >> and resulting change
requests >> >> >> The calibrate function sucks bit
time. The threaded calibrate >> function sucks even more. I'm
tempted to call it useless. I >> believe that you are observing the
following: There is some >> overhead involved in displaying and
updating the calibration, >> and as you are increasing the number of
threads more and more >> time is allocated to evaluation and less
and less to >> overhead. If you really want to test the speed of the
>> threading then you should analyse a match or perform a
rollout. >> >> The original calibration was meant to
calibrate certain >> timing functions against the speed of your
computer, so >> overhead didn't really matter. That is the function
measures >> the speed of your computer, not the speed of
gnubg. >> >> Christian. >> >> On Sun, Aug
2, 2009 at 5:06 PM, Ingo >> Macherius
wrote: >>> I have benchmarked gnubg on two server machines, with
>> particular focus >>> on multithreading. Both
Machines are headless and run Debian 5.x >>> Lenny, Kernel
2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The >> hardware
is: >>> box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2
chips) >>> box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT
>> "cores" in 2 >>>
chips) >>> >>> I found two issues with current gnubg
(latest CVS version >> as of August >>> 1st 2009,
compiled with gcc 4.3.2.1 with -march=native and sse2 >>>
support): >>> >>> 1) The "calibrate" command output is
off by a factor of 1000, i.e. >>> reports eval/s values 1000
times too high. This holds for >> the figure >>>
reported in the official Debian binary installed via
apt-get. >>> >>> 2) The limit of 16 threads is too
low, I found that to >> utilize the CPU >>> power to
100% 8 threads per core are needed. Interestingly >> this holds
>>> for the virtual HT cores as
well. >>> >>> @1: Please check the timer code, the
problem seems to be in >> timer.c. >>> Obviously the
#ifdef part for Windows is fine, but all >> other machines use a
faulty version of the timer. I can't >> really suggest a solution,
but here is some background info >> from wikipedia:
http://en.wikipedia.org/wiki/Rdtsc >>> I would help to fix this
one by testing on the >> beforementioned machines under 64 bit
Linux. >>> @2: I've tested with a custom gnubg binary with the bug
at @1 fixed >>> the hard way by dividing by 1000 hardcodedly and
thread >> limit raised >>> to 256. While calibrate was
running I've monitored CPU utilization >>> usiing the mpstat
command. >>> >>> box_A peaks at about 202K eval/s with
8 threads per core >> (32 total), >>> from where on the
number is static until it starts decreasing again >>> when you
use hundreds of threads. between 1 and 3 threads I see the >>>
expected gain of almost 100% per thread added. Using 4 threads is
>>> lowering the throughput as compared to 3 threads. Between 5
and 32 >>> threads I see rising throughput which first is linear,
and becomes >>> asymptotic as we get closer to 32 threads. Below
32 threads, mpstat >>> reports significant idle times for each
CPU, at 32 I see >> each is using >>> 100% of the
cycles. >>> >>> A very similar behavior is visible on
box_B, despite the >> fact 2 of its >>> "cores" are
virtual HT cores. >>> >>> Extrapolating the results
suggests gnubg should increase >> the limit for >>> the
number of max. threads to 64, maybe even 128 or 256. Rationale:
>>> recent server hardware with dual quadcores has 8 cores,
>> which should >>> be fully utilizeable only with 64
threads. The suggested 128 >>> anticipates future improvements.
As there seems to be little to no >>> cost with higher values for
max. threads, this seems like a >> cheap way >>> to
speed up gnubg on server class machines and quad cores >> at little
to >>> no cost. >>> >>>
Cheers, >>>
Ingo >>> >>> >>> >>>
_______________________________________________ >>> Bug-gnubg
mailing list >>> address@hidden
http://lists.gnu.org/mailman/listinfo/bug-gnubg >>> >> >>
------------------------------------------------------------------------ >> >>
_______________________________________________ >> Bug-gnubg mailing
list >> address@hidden >>
http://lists.gnu.org/mailman/listinfo/bug-gnubg
Celebrate a decade of Messenger with free winks, emoticons, display pics, and
more. Get Them Now
|