[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] TD-Gammon input, output encoding shemes

From: Truong Khanh
Subject: [Bug-gnubg] TD-Gammon input, output encoding shemes
Date: Wed, 16 Apr 2003 18:14:11 +0700

hello all,
    I was researching to apply the Reinforcement Learning, TD(lambda) into some game projects, and the stuffs I looked into that TD-Gammon game from Gerald Tesauro and GNUBG. A big obstacle that makes me feel difficult to understand is input and output encoding scheme.
    As I know, Tesauro used 3 layers with 198 input units, 40 hidden units, and 4 output units, then update the connection weights by formula TD(lambda).
    Typically, the neural net uses the pair (0,1) for input value along with log-sigmoid function. My question is what the input and output values for input and output units? Is that the TD(lambda) applied for each time-step (ply or half-move) ? Does it exist a clearer document to explain TD(lambda) and TD-Gammon?
    Currently I was reading the documents :
    Reinforcement Learning: An Introduction (Richard Sutton and Andrew Barto)
    Practical Issues in Temporal Difference Learning (Gerald Tesauro)
    Temporal Difference Learning and TD-Gammon(published in Communications of the ACM, March 1995 / Vol. 38, No. 3)
    Any instruction will be appreciated.
    Thanks in advance
Nguyen Truong Khanh
Software Engineer
Glass Egg Digital Media
E-Town Building, 7th Floor
364 Cong Hoa Street, TanBinh District
Ho Chi Minh City, Vietnam
Tel: (84) 8810-9018
Fax: (84) 8810-9013

reply via email to

[Prev in Thread] Current Thread [Next in Thread]