|Subject:||[Bug-gnubg] TD-Gammon input, output encoding shemes|
|Date:||Wed, 16 Apr 2003 18:14:11 +0700|
I was researching to apply the Reinforcement Learning, TD(lambda) into some game projects, and the stuffs I looked into that TD-Gammon game from Gerald Tesauro and GNUBG. A big obstacle that makes me feel difficult to understand is input and output encoding scheme.
As I know, Tesauro used 3 layers with 198 input units, 40 hidden units, and 4 output units, then update the connection weights by formula TD(lambda).
Typically, the neural net uses the pair (0,1) for input value along with log-sigmoid function. My question is what the input and output values for input and output units? Is that the TD(lambda) applied for each time-step (ply or half-move) ? Does it exist a clearer document to explain TD(lambda) and TD-Gammon?
Currently I was reading the documents :
Reinforcement Learning: An Introduction (Richard Sutton and Andrew Barto)
Practical Issues in Temporal Difference Learning (Gerald Tesauro)
Temporal Difference Learning and TD-Gammon(published in Communications of the ACM, March 1995 / Vol. 38, No. 3)
Any instruction will be appreciated.
Thanks in advance
|[Prev in Thread]||Current Thread||[Next in Thread]|