|Subject:||[Bug-gnubg] Re: Training neural nets: How does size matter?|
|Date:||Mon, 02 Sep 2002 19:31:39 +1200|
Quoting Joern Thyssen <address@hidden>:On Fri, Aug 30, 2002 at 09:56:28AM +0200, Øystein O Johansen wrote > Hi,> > gnubg uses a race network, a contact network and a crashed network.> I think these scheme works OK. The crashed network is a not mature > yet, but there is work in progress. There has also been discussions > about splitting into more classes. I guess this will be done > eventually, but we must take one step at the time. Two years ago > there was also a network called BPG, (Backgame and Prime). There> was really some problems with this net, so is was removed.As far as I remember the problem back then was that a position could beBPG, then turn into contact, and back into BPG.That does make it more difficult to bootstrap, but was that the real problem?
A very real problem. I eventually realized the problem can be simply solved by training the contact net with a better data set, obtained by several phases of rollouts and 2ply evaluations.
There is discontinuities are the boundaries between two neural nets, i.e., the evaluate the same position differently. For example, that may lead to the program refusing to break contact, as the contact neural net overestimates the contact position. As I remeber it the big problem withthe BPG net was huge discontinuities at the boundaries.The conclusion was that we should avoid "loops". For example, currentlywe have:Contact ----------- Race ------ Bearoff \ /\__ Crashed __/so no loops. Another way to avoid the discontinuities is the meta-pi scheme, since this will make the evaluations continuous. The price is that you, in general, have to evaluate two neural nets.Would two neural nets and a meta-pi system be better than one neural net of twice the size? I don't see the advantage, abstractly, although I can imagine that one would mainly focus on the race in a midpoint vs. midpoint contact position. (On the other hand, Walter Trice mentioned a very interesting midpoint vs. midpoint position that would be a big pass due to the race, but was a take due to the pips wasted on repeated x-1's.) I call the discontinuities "blemishes" after Boyan. I think both humans and neural nets face a problem related to blemishes when considering radically different positions that can result from playing doubles in different ways. It hardly matters if the contact net is used to evaluate both a blitz and an outside prime, as there is little reason for the evaluations to be consistent. One solution I try in my own play is to be able to produce better absolute evaluations, e.g., "After this move, am I the favorite? Should I accept if my opponent offers me a point on a 2-cube?" This has prevented a few massive blunders, and I think it is actually a strength of bots, not a weakness. How did you try to find a representative set of data to train the backgame net?
I use both data from bots play agains humans and self play. The problem is there are not enough "unfamiliar" territory there, i.e. a player can easily lead the current nets into hostile areas by slotting wildly. I think GNU is still weaker here than JF, mloner and perhaps SN as well. While I made several improvments in the past, I think there is still a long way to go here.
Is there an archive of the discussions?Douglas Zare
|[Prev in Thread]||Current Thread||[Next in Thread]|