[Bug-gnubg] TD-Gammon input, output encoding shemes

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] TD-Gammon input, output encoding shemes

From:	Truong Khanh
Subject:	[Bug-gnubg] TD-Gammon input, output encoding shemes
Date:	Wed, 16 Apr 2003 18:14:11 +0700

hello all,

I was researching to apply the Reinforcement Learning, TD(lambda) into some game projects, and the stuffs I looked into that TD-Gammon game from Gerald Tesauro and GNUBG. A big obstacle that makes me feel difficult to understand is input and output encoding scheme.

As I know, Tesauro used 3 layers with 198 input units, 40 hidden units, and 4 output units, then update the connection weights by formula TD(lambda).

Typically, the neural net uses the pair (0,1) for input value along with log-sigmoid function. My question is what the input and output values for input and output units? Is that the TD(lambda) applied for each time-step (ply or half-move) ? Does it exist a clearer document to explain TD(lambda) and TD-Gammon?

Currently I was reading the documents :

Reinforcement Learning: An Introduction (Richard Sutton and Andrew Barto)

Practical Issues in Temporal Difference Learning (Gerald Tesauro)

Temporal Difference Learning and TD-Gammon(published in Communications of the ACM, March 1995 / Vol. 38, No. 3)

Any instruction will be appreciated.

Thanks in advance

-------------------------------------------
Nguyen Truong Khanh
Software Engineer
Glass Egg Digital Media
E-Town Building, 7th Floor
364 Cong Hoa Street, TanBinh District
Ho Chi Minh City, Vietnam
Tel: (84) 8810-9018
Fax: (84) 8810-9013
address@hidden
www.glassegg.com

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-gnubg] TD-Gammon input, output encoding shemes, Truong Khanh <=

Prev by Date: [Bug-gnubg] 未承諾広告＊３億円への一歩
Next by Date: Re: [Bug-gnubg] RE: Latest GNUBG builds won't open my position files anymore! SOLVED (BUG)
Previous by thread: [Bug-gnubg] 未承諾広告＊３億円への一歩
Next by thread: [Bug-gnubg] analysis of table stakes
Index(es):
- Date
- Thread