|
From: | Truong Khanh |
Subject: | [Bug-gnubg] TD-Gammon input, output encoding shemes |
Date: | Wed, 16 Apr 2003 18:14:11 +0700 |
hello all,
I was researching to apply the
Reinforcement Learning, TD(lambda) into some game projects, and the stuffs
I looked into that TD-Gammon game from Gerald Tesauro and GNUBG. A big
obstacle that makes me feel difficult to understand is input and output encoding
scheme.
As I know, Tesauro used 3 layers with 198
input units, 40 hidden units, and 4 output units, then update the connection
weights by formula TD(lambda).
Typically, the neural net uses the pair
(0,1) for input value along with log-sigmoid function. My
question is what the input and output values for input and output units? Is that
the TD(lambda) applied for each time-step (ply or half-move) ? Does it
exist a clearer document to explain TD(lambda) and TD-Gammon?
Currently I was reading the documents
:
Reinforcement Learning: An
Introduction (Richard
Sutton and Andrew
Barto)
Practical Issues in Temporal Difference
Learning (Gerald
Tesauro)
Temporal
Difference Learning and TD-Gammon(published in Communications of the
ACM, March 1995 / Vol. 38, No. 3)
Any instruction will be
appreciated.
Thanks in advance
-------------------------------------------
Nguyen Truong Khanh Software Engineer Glass Egg Digital Media E-Town Building, 7th Floor 364 Cong Hoa Street, TanBinh District Ho Chi Minh City, Vietnam Tel: (84) 8810-9018 Fax: (84) 8810-9013 address@hidden www.glassegg.com |
[Prev in Thread] | Current Thread | [Next in Thread] |