bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question


From: Øystein Johansen
Subject: Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Date: Thu, 21 May 2009 21:18:44 +0200
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

boomslang wrote:
> Hi Øystein / others,
> 
> I didn't know gnubg used just TD(0). This does make things easier for
> me.  The Sutton/Barto you're referring to..., is that the book
> "Reinforcement Learning: An Introduction"?

Yes! It's even available online in HTML formatting.

> I do have a question about this supervised training, though. Could
> you give an indication of the number of games it takes to get a good
> kick start with TD(0), and how big should the database with
> positions/rollouts be for the supervised training?

What you need is some way to measure how strong your neural net is. This
is the really hard part. It's easy to make a neural net that plays
backgammon. Takes some patience and, but it is easy. Finding out how
strong it plays -- that's hard. If you have nothing else to compare to,
I would make a head-to-head match against pubeval as a benchmark. Say
ten thousand cubeless moneygames agaist pubeval to measure the strength.
of a neural net.

As an early early development training kick start pubeval, here is what
I would have done:

0. Start with initial weight
1. Do a head-to-head benchmark against pubeval, and log the result.
2. Train TD(0) for some thousand games.
3. Repeat from 1. until no further improvements are observed.

In gnubg we use a database of +100.000 positions where we have rolled
out the best moves. Each position is presented to the neural net and we
see if it picks the best move. This is used to benchmark the neural net
evaluators. However, you need a neural net evaluator to rollout such
benchmark database. Seems like a catch 22, and that's why I suggest
pubeval. (You can of course also get our benchmark database.)

To summarize:
* You need a system to benchmark you neural nets,
  so you get an idea if it's improving or not.
* Keep a log of your results. (Important!)

> Thanks again, I appreciate your help.

Keep up the good work! Just remember you need a lot of code before you
can start any training. Don't lose your motivation before you're there.
May I ask how far you've come?

-Øystein

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]