bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: current development


From: Øystein Schønning-Johansen
Subject: Re: current development
Date: Wed, 4 Dec 2019 21:23:09 +0100

But let's chat about the idea instead. What will it actually mean to 'apply "AlphaZero methods" to backgammon.' ?

AlphaZero (and AlphaGo and Lc0 and SugaR NN) is just more or less the same thing as reinforcement learning in backgammon. So, from my understanding, it is rather AlphaZero, who has applied the backgammon methods. They are both the chess and go variants trains with reinforcement learning pretty much like the original GNU Backgammon, Jellyfish and Snowie. In Go they had to make a move selection subroutine based on human play and then add MCTS to train. Also the neural networks are deeper and more complex. The nn inputs features are also so more complex and can to some extend resemble convolutions known from convolutional neural network (And that the inputs are not properly described in the high level articles.)

Apart from that, it is actually same thing: Reinforcement learning.

But how can we improve: We believe (at least I do) that the current state of backgammon bots are so strong that it plays close to perfect in standard positions. It is in uncommon and long term plan positions (like deep backgames and snake rolling prime positions) bots still can improve. Let me throw some ideas up in the air for discussion:

Can we make a RL algorithm that is so fast that it can learn on the fly? Say we during play find a position where some indicator (that may be another challenge) indicates that this is a position that requires long term planning. If we then have the ability to RL train a neural net for that specific position, that could be an huge improvement in my opinion. (Lot's of details missing.)

And then, could the evaluations be improved if we specialize neural networks in to specific position types, and then make a kind of nn selection system based on k-means of the input features. I tried that many years ago with only four classes. Those experiments showed that it's not hopeless approach, and with faster computers it can easily create much more than just four classes (fours was only the first number that popped into my head those days)

Then next idea: What about huge scale distributed rollouts? Maybe we could have a system like BOINQ to do rollouts on the fly? I'm not sure how this should be used in a practical sense, and I'm not sure how hard it would be to implement (with or without BOINQ framework) but I'm just kind of brainstorming here.

-Øystein


On Wed, Dec 4, 2019 at 6:47 PM Joseph Heled <address@hidden> wrote:
I was intentionally rude because I thought his original post was inappropriate.

-Joseph

On Thu, 5 Dec 2019 at 06:42, Ralph Corderoy <address@hidden> wrote:
>
> Hi Joseph,
>
> > I thought so.
> >
> > I had the same idea the day I heard they cracked go, but just saying
> > something is a good idea is not helpful at all in my book.
>
> I think you're wrong.  And also a bit rude to boot.
>
> It's fine for Tim to suggest or ponder an idea to the list.  It may
> encourage another subscriber, or draw out news of what a lurker has been
> working on that's related.
>
> --
> Cheers, Ralph.
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]