[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] redesign gnubg to master/slave
Re: [Bug-gnubg] redesign gnubg to master/slave
Tue, 23 Dec 2003 17:41:48 +0100
On Tue 23 Dec 2003 (00:38 +0100), Olivier Baur wrote:
> Hi everybody!
> I'm finally back from Africa -- back to the gnubg world!
> Le 21 d?c. 03, ? 20:54, Joern Thyssen a ?crit :
> >One of the recent topics has been multi-threading. Instead of
> >multi-threading Jim suggested splitting gnubg into a GUI (master) and
> >engine (slave) which communicates over sockets.
> >Besides avoiding the problem of making the engine of gnubg thread-safe
> >it also follows the Unix design principles
> >(http://www.catb.org/~esr/writings/taoup/html), i.e.,
> As Holger pointed out, I have written a thread-safe version of gnubg;
> more precisely, a version of gnubg that can compute rollouts in
> multiple threads, taking advantage of multi-processor hosts. It can
> also communicate with other gnubg's on a TCP/IP network, extending the
> notion of multi-threaded "gnubg computing units" to "remote gnubg
> computing units".
That's good news. I hadn't looked at that branch before, as I thought
it was concerned solely with multi-threading within one process
> The main parts of the work here:
> 1/ make gnubg thread-safe (at least in its rollout part and GUI part)
> -- that is , mainly hunting down "bad" behaved globals and other static
> variables that would mix up between different thread contexts;
> 2/ rewrite the rollout function so it now divides its rollout job in
> smaller 1-game rollout tasks, with are then handled and dispatched by a
> "task engine", which collects results back to the rollout function for
> result accumulation; the bad news is this is several-month old, and I
> think some new features have been added to rollouts, like rollout
If it's down to one game rollout tasks, then it should not be hard to
adapt it to the changes for interrupt/resume and stopping based on
> For those who will have a look at cvs branch-multi, I have set up my
> implementation notes at the following address (plain ascii text file):
> http://mapage.noos.fr/gnubgosx/processingunits.txt (different than the
> user-oriented HTML version that Holger already mentioned:
> It describes more technically the concepts I used, how I made gnubg
> thread-safe, and how I implemented multi-threading and task handling.
I will definitely look at both of the above.
> Le 22 d?c. 03, ? 19:42, Jim Segrave a ?crit :
> >There should be a stripped down version of gnubg, call it
> >gnubg-analyser, which should be able to do the following:
> >1) Given a position, a set of dice rolls and the evaluation rules,
> > select the best move, if any exists, for each roll - sort of a
> > FindnSaveBestMove() over a more limited set.
> >2) given a position, a rollout setting, and a trial number, rollout a
> > single game and return the results - more or less
> > BasicCubefulRollout()
> >3) given a position, a dice roll and a move or just a cube decision,
> > and an analysis context, do an AnalyseMove()
> This is what I called "slave processing units" ; processing units are
> threads of gnubg (typically, one per processor, on one or several
> hosts) ; and there's the master gnubg thread, that drives the GUI (GTK)
> and uses the available slave processing units, either local -- running
> on the host, one per proc --, or remote -- running on remote hosts, one
> per processor on remote host --) . The version of gnubg I wrote starts
> up in master mode (the usual gnubg), but can be switched to "slave"
> mode, ready to join a master. This can be done either in TTY or GTK.
I'd sort of like to see the analysis and rollout engine available
as a standalone process, rather than as a gnubg with different command
line options or settings. One reason I was thinking about this is that
if gnubg can be split into a 'display/save/load/play portion' and a
'analyse engine portion', where a regular gnubg has both and a slave
has only the latter, then:
1) People who want to use the engine for things like a FIBS bot, have
a simpler task.
2) A display/save/load/play only process would make a very nice start
for a FIBS or other server client program
neither of the above is a major issue - I'm not fanatically devoted to
doing such a division
> Please note there are also some non-trivial data to transmit along,
> like random number generator info (like seed roots or RNG state) --
> which game will be rolled out with wich dice is of utmost importance to
> variance reduction.
In fact, it's the seed and which trial it is which is important. Both
of these are in the current rollout context.
> I've also written this.
> Slaves can notifiy their presence either to all available master hosts
> on the local TCP/IP network, or to a single, user-chosen master. When a
> master spots a self-notifying slave, it automatically adds it as
> co-operating gnubg to its list of processing units.
> A master can also be set to connect to an arbitrary list of gnubg
> slaves, for exemple by feeding it with a startup file containing gnubg
> "pu" instructions (type "help pu" , "help get pu", "help set pu" in
> gnubg to get the details, or some older mail I posted about it on the
> list some months ago; I need to find the references back)
Excellent - another task already completed.
> >During startup, the analysers should inform gnubg about their
> >approximate computing power, this allows the master to make some
> >guesses about load balancing between machines. Say someone has a
> >couple of old P-300s on a home network and a 2.4G main machine. You
> >would not want to try to split the work 33%-33%-33%, you'd probably
> >want to do something like 10% - 10% - 80% when doing analysis or
> I didn't go to these ends, but for now you can define, on the master,
> different queue sizes for each remote slave; the queue size basically
> relates to the number of tasks that are sent at the same time to a
> remote host; so if you're using 2 remote hosts, one being twice as fast
> as the other one, you would set the queue size of the faster to 2 and
> the slower to 1, or maybe 8 and 4 if they happen to compute so fast
> that network overhead becomes too important as compared to real
> computations) ; basically, it's hard to precompute how you are going to
> split the job: maybe your 10%-10%-80% ratio is not correct; maybe one
> host will crash and you'll have to split the remaining; maybe some new
> hosts might join in the middle of a long rollout; I think the best
> policy is "serve a soon as finished": feed a slave as soon as it has
> returned the results of all the tasks it was given; work on the fly.
> The "ratio" issue only pertains when there are only a few tasks
> remaining to compute in the whole job, but then it gets a very hard
> work to make a good job splitting decision: who do you give these
> remaining tasks to? do you give them all to process A who is ready to
My thinking here was what do do when choosing plays where you have a
position and a roll. Once you have a list of legitimate moves, you
simply count the number of moves to be evaluated and assign 1/Xth of
them in a single command to a process to be evaluated at say 2 or
three ply, returning the ordering and evaluations for the plays. It
makes an intersting tradeoff, as it may evaluate more plays at a
higher depth than a single process would do, since the movefilters
will be pruning over a smaller selection of plays, so it won't always
be n times faster, but the search will be wider.
> >The connections should have keep-alives as part of their protocol, so
> >that the loss of one of the gnubg-analysers would be noticed before
> >the main gnubg has waited too long for answers that won't be
> I still have to add better "timeout" handling.
> For now, gnubg masters can safely handle slave disappearing from the
> network (network problem or slave crash/killed/returned to master
without keepalives, it can take a while to see a connection as dropped.
> >I can see a few different modes of working:
> >During ordinary play, if gnubg is set to some fairly demanding
> >standard - I often play against gnubg on 2 ply supremo settings, then
> >individual moves could be shared among analysers. I'm not sure how
> >cube decisions would be partitioned, but given a dice roll, I'd think
> >you could do something like:
> >gnubg builds a list of legal moves. It then divides that list up and
> >lets the gnubg-analysers pick the best moves out of a subset of the
> >legal ones, while gnubg does some for itself. When it completes it's
> >selection, it then waits for the others to complete, gathers the
> >results and makes the final decision.
> > For 0 ply (and quite possibly 1 play), there's
> >probably no point in using any remote analysis. For 2 ply you assume
> >that each legal move will take the same amount of time to analyse, so
> >in the situation postulated before of 2 slow analysers, given 12 legal
> >moves, gnubg would do 10 and pass 1 to each to the other machines.
> I didn't write anything for n-ply evaluation yet (only rollouts, see below)
> I agree with you that it's no use doing anything with 0-ply and 1-ply evals.
> I don't think it's a good idea to split the work before starting to
> compute. Rather, divide the work (eg, quickly said, divide a 2-ply eval
> into, say, 18 1-ply evals if there are 18 legal moves, and have these
> 18 tasks given to available threads/processes (processing units in my
> terminlogy) -- you can't know beforehand the time that n-ply
> evaluations will take (it depends on number of available moves in
> deeper plies), so it's hard to fairly dispatch the tasks. One main
> thread should do the "job management" (this I called the "task engine"
> in my implementation, run by the master thread: send tasks, collect
> results), and one more "computing thread" per processor crunches
> numbers; plus processes running on remote hosts and communicating with
> master thread (actually with other threads that run locally on the
> master host and handle the individual TCP connections and exchange
That's certainly another approach and may be more efficient, assuming
the evaluation time is significant compared to the time to dispatch
and receive the results.
> >The main disadvantage to this is
> >that caching is largely defeated, since the forward evaluations of the
> >selected move are likely to have been made on a remote machine. Here
> >gnubg must decide how much work to hand to each analyser and how much
> >to keep for itself.
> I'm not sure cache impact would be so great; I think that cache can be
> mostly useful as you "travel down" the tree of possibilities (plies),
> since positions might reappear on deeper layers of the tree; I'd think
> there is less to share among different (parallel) parts of the tree
> when the first node is already different. I'm not sure if I what I mean
> is clear...
If you were using my approach of using separate processes to choose
plays (eg. the 2 or 3 ply evaluation of a given play may happen
entirely on a remote host), then you don't have the evaluations of the
following positions from the chosen play available at the start of the
next move, they only exist on the machine which did the evaluation of
the chosen move. If gnubg uses your method of dispatching the
individual plies, then the master will have all these intermediate
evaluations, but it wouldn't normally provide them to the other
processes when it begins doing the next move. In both cases, useful
cached data is lost between succesive moves.
> >When anlaysing a match or game, the same procedure as the above
> >paragraph would be followed - gnubg would build the moverecords, then
> >begin handing them out to the remote analysers. The only difference
> >would be that the user would have given the analyse game/session/match
> >command, so gnubg itself would know that it could also do some of the
> >analysis itself - it would include itself in the list of available
> Yes, agreed.
> >As soon as an analyser completes a move, it is given the next
> >unprocessed one. In this case there's no need to guess at which
> >machine is the most powerful, they will each work to full capacity as
> >long as there are uncompleted moves.
> Agreed too. This meets the "feed slave as soon as ready" strategy I
> propose for position evals and that I implemented in rollouts.
> But I think cache will be important here; that is, if you have already
> analysed play N in the match with one processing unit, then you'd
> better give it play N+1 to analyse, because there surely will be plenty
> of useful data in the cache! I think we'll need some task dispatch
> algorithm that strive to give processing units as many consequent plays
> to evaluate as possible, without precomputing the job splitting.
> Interesting problem... :-)
Once you get a few moves behind - say there are 6 unanalysed moves and
two slaves, one slave gets the 3 oldest moves and the other gets the 3
most recent moves.
> >Rollouts are the most interesting problem, because we want to be able
> >to interrupt and resume rollouts. Gnubg begins by deciding it will do
> >the first trial, then passing the next n trials to the
> >gnubg-analysers. During it's rollout, it checks for analysers having
> >completed. If they have, it stores the results of that one rollout and
> >issues the next trial to the analyser. When gnubg completes a rollout,
> >it then takes the results of that rollout and any following ones in
> >the series (not including gaps) and builds the cumulative result.
> Agreed, this is also my "feed as soon as ready" philosophy.
> One thing: this can also be done on the Internet (why limit to a home
I never meant only a home network - hence my comment about
firewalls. It's just that it seems to be more and more common for
people to have 2 or three PCs available to themselves.
> One can even imagine "gnubg slave farms" to compute big rollouts or
> maybe help in training new neural nets (I don't know if training
> algorithms can be easily split down into subtasks to dispatch among
> cooperating slaves, like the SETI screensaver)
I think Joseph's method already works pretty well for this one.
> >Other thoughts:
> >To deal with firewall issues, it should be possible to specify the
> >ports to be used for both outbound connections from gnubg and inbound
> >to the analyser.
> >It would be a good idea to link to libwrap
> >The analyser should be allowed to work either from stdin/stdout (so it
> >can be invoked from inetd or with a command line to listen on a
> >specific port.
> >All the exchanges should be done in ASCII, numerical values can be
> >exchanged in %.18g or similar format.
> >It's probably easiest, though not the most efficient, to exchange the
> >full analysis or rollout contexts with each service request and to
> >pass them back again as part of the results.
> Agreed to. I've also implemented the above, to some degree. Details can
> be found in the txt document I pointed at the beginning of this mail.
> Bottom line
> I'm not sure it is necessary to go through a full rewrite of gnubg in
> master/slave at the GUI / engine level, as I've already done it at
> master thread / slave thread level, with only the master thread having
> access to GUI, and a single gnubg process being able to be switched
> to/from master/slave state.
> Please have a look at branch-multi, and try and compile it on your
> platforms. I know that it works on Macs (multi-threading and remote
> processing), on PC's (multithreading only, still got problems with
> winsock for remote processing), and I think it should work with most
> POSIX-compliant flavours of UNIX/Linux.
I hadn't been looking to closely before, as I thought it was all
geared towards threading, it sounds like you've got the bulk of the
work already done.
Jim Segrave address@hidden