bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] Benchmarks on server class machines and resulting change


From: Ingo Macherius
Subject: RE: [Bug-gnubg] Benchmarks on server class machines and resulting change requests
Date: Wed, 5 Aug 2009 01:58:55 +0200

Jon,
 
the test was conducted on machine A (2 x Core2 processor Xenon 5130 with 4 cores  total) using defaults. The machine was idle except for running the test and my SSH connection. The batch eval script was this one (with number of threads added by a small /bin/bash script. I will repeat the tests on the hyperthreaded machine, but probably not for 64 threads but for 16. Stay tuned.
 
Ingo
 
---snip--- file: batch ----
set lang C
set evaluation chequerplay evaluation plies 2
set evaluation chequerplay evaluation cubeful on
set evaluation chequerplay evaluation prune off
set evaluation cubedecision evaluation plies 2
set evaluation cubedecision evaluation cubeful on
set evaluation cubedecision evaluation prune off
 
clear cache
clear hint
analysis clear
 
import mat You_vs_silent_greek_20090802162313580.mat
analyze match
[... and a lot more matches, repeating the sequeence starting at "clear cache" ...]
---snip---
 
#!/bin/bash
LIMIT=64
TMPBATCH=/tmp/gnubgbatch
BATCH=./batch
((t = 1))
 
while (( t <= LIMIT ))
do
  echo "threads = $t"
  echo > ${TMPBATCH} "set thread $t"
  cat ${BATCH} >> ${TMPBATCH}
  time ./gnubg < ${TMPBATCH} > /dev/null
  ((t += 1))
done
 
-----Original Message-----
From: Jonathan Kinsey [mailto:address@hidden
Sent: Tuesday, August 04, 2009 9:41 AM
To: address@hidden
Cc: address@hidden; address@hidden
Subject: Re: [Bug-gnubg] Benchmarks on server class machines and resulting change requests

It's not clear if you were using the hyper-threaded machine as this might
explain the jump form 1 to 2 cores and the smaller jump to 3 and 4 cores.

If you were using machine "B", try running the test again for 1,2,3,4 threads on
machine "A". Make sure the cache size is set to maximum.

Jon

Ingo Macherius wrote:
> Christian, I've conducted your suggested experiment (batch eval of saved matches) and can confirm your answer. Calibrate ist not a suitable metric to evaluate threading behaviour for gnubg.
>
> The batch experiment did analyze five 7pt matches for 4 times each, with full cache cleaning. The time was taken with unix "time" command. The results are much more like what one would expect:
> - Speed peaked wheen the number of threads equaled the number of cores
> - Adding more threads than cores slowed down the evaluation (albeit, by only a tiny nit)
> - Speed decrease increased in the number of threads
>
> The odd finding is that there still are some anonalies, which are:
> - Going from 1 to 2 threads more than doubles the evaluation
> - It has very little effect adding more threads, i.e. the gain is not linear in # cores
> - 2, 3 and 4 threads result in speeds very close to each other, much closer than expected
>
> I've attached a ZIP which contains the original OpenOffice 3.1 spreadsheet and a PDF version of the graphs with the experiment details.
>
> Thx a lot for your guidance!
>
> Ingo
>
>> -----Original Message-----
>> From: Christian Anthon [mailto:address@hidden
>> Sent: Monday, August 03, 2009 12:29 PM
>> To: Ingo Macherius
>> Cc: address@hidden
>> Subject: Re: [Bug-gnubg] Benchmarks on server class machines
>> and resulting change requests
>>
>>
>> The calibrate function sucks bit time. The threaded calibrate
>> function sucks even more. I'm tempted to call it useless. I
>> believe that you are observing the following: There is some
>> overhead involved in displaying and updating the calibration,
>> and as you are increasing the number of threads more and more
>> time is allocated to evaluation and less and less to
>> overhead. If you really want to test the speed of the
>> threading then you should analyse a match or perform a rollout.
>>
>> The original calibration was meant to calibrate certain
>> timing functions against the speed of your computer, so
>> overhead didn't really matter. That is the function measures
>> the speed of your computer, not the speed of gnubg.
>>
>> Christian.
>>
>> On Sun, Aug 2, 2009 at 5:06 PM, Ingo
>> Macherius wrote:
>>> I have benchmarked gnubg on two server machines, with
>> particular focus
>>> on multithreading. Both Machines are headless and run Debian 5.x
>>> Lenny, Kernel 2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The
>> hardware is:
>>> box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips)
>>> box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT
>> "cores" in 2
>>> chips)
>>>
>>> I found two issues with current gnubg (latest CVS version
>> as of August
>>> 1st 2009, compiled with gcc 4.3.2.1 with -march=native and sse2
>>> support):
>>>
>>> 1) The "calibrate" command output is off by a factor of 1000, i.e.
>>> reports eval/s values 1000 times too high. This holds for
>> the figure
>>> reported in the official Debian binary installed via apt-get.
>>>
>>> 2) The limit of 16 threads is too low, I found that to
>> utilize the CPU
>>> power to 100% 8 threads per core are needed. Interestingly
>> this holds
>>> for the virtual HT cores as well.
>>>
>>> @1: Please check the timer code, the problem seems to be in
>> timer.c.
>>> Obviously the #ifdef part for Windows is fine, but all
>> other machines use a faulty version of the timer. I can't
>> really suggest a solution, but here is some background info
>> from wikipedia: http://en.wikipedia.org/wiki/Rdtsc
>>> I would help to fix this one by testing on the
>> beforementioned machines under 64 bit Linux.
>>> @2: I've tested with a custom gnubg binary with the bug at @1 fixed
>>> the hard way by dividing by 1000 hardcodedly and thread
>> limit raised
>>> to 256. While calibrate was running I've monitored CPU utilization
>>> usiing the mpstat command.
>>>
>>> box_A peaks at about 202K eval/s with 8 threads per core
>> (32 total),
>>> from where on the number is static until it starts decreasing again
>>> when you use hundreds of threads. between 1 and 3 threads I see the
>>> expected gain of almost 100% per thread added. Using 4 threads is
>>> lowering the throughput as compared to 3 threads. Between 5 and 32
>>> threads I see rising throughput which first is linear, and becomes
>>> asymptotic as we get closer to 32 threads. Below 32 threads, mpstat
>>> reports significant idle times for each CPU, at 32 I see
>> each is using
>>> 100% of the cycles.
>>>
>>> A very similar behavior is visible on box_B, despite the
>> fact 2 of its
>>> "cores" are virtual HT cores.
>>>
>>> Extrapolating the results suggests gnubg should increase
>> the limit for
>>> the number of max. threads to 64, maybe even 128 or 256. Rationale:
>>> recent server hardware with dual quadcores has 8 cores,
>> which should
>>> be fully utilizeable only with 64 threads. The suggested 128
>>> anticipates future improvements. As there seems to be little to no
>>> cost with higher values for max. threads, this seems like a
>> cheap way
>>> to speed up gnubg on server class machines and quad cores
>> at little to
>>> no cost.
>>>
>>> Cheers,
>>> Ingo
>>>
>>>
>>>
>>> _______________________________________________
>>> Bug-gnubg mailing list
>>> address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg
>>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bug-gnubg mailing list
>> address@hidden
>> http://lists.gnu.org/mailman/listinfo/bug-gnubg





Celebrate a decade of Messenger with free winks, emoticons, display pics, and more. Get Them Now

reply via email to

[Prev in Thread] Current Thread [Next in Thread]