|
From: | Jonathan Kinsey |
Subject: | Re: [Bug-gnubg] Removal of non-threaded code |
Date: | Tue, 2 Jun 2009 10:29:05 +0000 |
I've got access to a single threaded machine and just did the same quick test with a few different builds (all on windows with gcc 3.4.5 -O3): set gnubgid 4HPwATDgc/ABMA:cAkNAAAAAAAA eval And I got these results: st mt mt(1) mt(2) mt(3) 54.254 55.259 54.000 55.094 53.379 54.098 54.972 53.974 54.755 53.096 ------ ------ ------ ------ ------ 54.176 55.116 53.987 54.925 53.238 % diff 1.73% -0.35% 1.38% -1.73% Notes 1 Cache locking code commented out to establish if this is cause of slower times 2 Cache locking switched off by function pointer (attempt to speed up mt times) 3 Same as 2 but with experimental position key code The first two rows are separate tests and then the average below, a quick look shows that they aren't particularly accurate but it gives a rough idea. The problem with (2) is that this would slow down multiple thread runs. I think we should be optimising for multiple core use. We could duplicate even more code and then either do a evaluation with/without cache locking (depending on the number of eval threads) - and this should give the same performance as the single threaded builds. Maybe some clever use of the preprocessor could minimise the amount of duplicated source. My rewrite of the PositionKey functions seems to give about a 3% increase so with the new sigmoid function we might have a compelling reason for people to upgrade to the latest version. Jon Christian Anthon wrote: > I have timed some simple evaluations of the opening positions using > various compile settings. The following times is reported for each of > the compile settings. > > A. 3x 4ply evaluation (clearing the cache in between with a command that > is not in the present code) > B. 3x clearing the cache without any evaluation > C 1000x 2ply evaluation (clearing the cache in between) > D 1000x clearing the cache > > The lost time is from locking and unlocking the cache, I believe. > > threaded > 146.307531 > 0.011090 > 104.297596 > 3.803742 > > non-threaded > 138.310104 > 0.010516 > 92.876412 > 3.614214 > > threaded-sigmoidSSE > 139.664481 > 0.011588 > 95.686871 > 3.824007 > > non-threaded-sigmoidSSE > 131.947215 > 0.010806 > 87.237141 > 3.605156 > > from timeit import * > > gnubg.command("set gnubgid 4HPwATDgc/ABMA:cAkNAAAAAAAA") > > gnubg.command("set evaluation cube evaluation plies 4") > t = Timer('gnubg.command("clear cache"); gnubg.command("eval")', 'import > gnubg') > print "%f" % t.timeit(3) > > t = Timer('gnubg.command("clear cache")', 'import gnubg') > print "%f" % t.timeit(3) > > gnubg.command("set evaluation cube evaluation plies 2") > t = Timer('gnubg.command("clear cache"); gnubg.command("eval")', 'import > gnubg') > print "%f" % t.timeit(1000) > > t = Timer('gnubg.command("clear cache")', 'import gnubg') > print "%f" % t.timeit(1000) > > > > On Wed, Apr 29, 2009 at 2:38 PM, Massimiliano Maini > > > wrote: > > > > Jonathan Kinsey > > > > > Massimiliano Maini wrote: > >> > >> Christian Anthon wrote on 29/04/2009 10:23:59: > >> > >>> On Wed, Apr 29, 2009 at 10:04 AM, Massimiliano Maini > >>> address@hidden > >>> > >>> bug-gnubg-bounces+massimiliano.maini=amadeus.com > > >>> 28/04/2009 22:01:23: > >>> > >>> MaX build with single thread : ~32400 eval/s > >>> MaX build with MT code, 1 thread : ~24800 eval/s > >>> MaX build with MT code, 2 threads : ~34600 eval/s > >>> > >>> However, a quick rollout (648 trials, expert, full, 2 top moves of > >> postion > >>> t60BYCButycAAA:cAnnAWAASAAA) has shown the following: > >>> > >>> MaX build with single thread : 2m04s > >>> MaX build with MT code, 1 thread : 2m04s > >>> MaX build with MT code, 2 threads : 1m48s > >>> > >>> I'm much more worried about the last two numbers here. MT code > >>> should give close to twice the speed, or we are doing something > wrong. > >> > >> Here at office the PC is single core, don't know if this > explains the > >> "poor" result. I'll check at home (dual core). > > > > You did say the pc was "1 core, 2 threads", does this mean it's a > > hyper-threaded > > machine? That would match a small increase for 2 threads, > > Yes, 1 core with hyper-thread. I wasn't really surprised by the > small increase. > > > note also that the 1 > > thread test will be using 2 threads (one for the gui and one for the > > evaluations > > - the gui thread will only be redrawing the screen). > > I run the calibrate on the command line version and the rollout in > the gui > one. Not sure it's a big deal however ... just a progress bar and a > few numbers > updated from time to time ... > > > The best test would be on a simple single core/processor machine, > these are > > getting quite rare, all the pcs I see are multi-core now. > > MaX. > > Upgrade to Internet Explorer 8 Optimised for MSN. Download Now |
[Prev in Thread] | Current Thread | [Next in Thread] |