Improving Fictitious Play Reinforcement Learning with Expanding Models

11/27/2019
by   Rong-Jun Qin, et al.
0

Fictitious play with reinforcement learning is a general and effective framework for zero-sum games. However, using the current deep neural network models, the implementation of fictitious play faces crucial challenges. Neural network model training employs gradient descent approaches to update all connection weights, and thus is easy to forget the old opponents after training to beat the new opponents. Existing approaches often maintain a pool of historical policy models to avoid the forgetting. However, learning to beat a pool in stochastic games, i.e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples. In this paper, we propose a learning process with neural fictitious play to alleviate the above issues. We train a single model as our policy model, which consists of sub-models and a selector. Everytime facing a new opponent, the model is expanded by adding a new sub-model, where only the new sub-model is updated instead of the whole model. At the same time, the selector is also updated to mix up the new sub-model with the previous ones at the state-level, so that the model is maintained as a behavior strategy instead of a wide distribution over policy models. Experiments on Kuhn poker, a grid-world Treasure Hunting game, and Mini-RTS environments show that the proposed approach alleviates the forgetting problem, and consequently improves the learning efficiency and the robustness of neural fictitious play.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Neural Fictitious Self-Play on ELF Mini-RTS

Despite the notable successes in video games such as Atari 2600, current...
research
01/07/2020

Frosting Weights for Better Continual Training

Training a neural network model can be a lifelong learning process and i...
research
09/13/2020

Efficient Competitive Self-Play Policy Optimization

Reinforcement learning from self-play has recently reported many success...
research
02/23/2023

Targeted Search Control in AlphaZero for Effective Policy Improvement

AlphaZero is a self-play reinforcement learning algorithm that achieves ...
research
08/21/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

One practical requirement in solving dynamic games is to ensure that the...
research
09/17/2020

Finding Effective Security Strategies through Reinforcement Learning and Self-Play

We present a method to automatically find security strategies for the us...
research
02/21/2020

Efficient Learning of Model Weights via Changing Features During Training

In this paper, we propose a machine learning model, which dynamically ch...

Please sign up or login with your details

Forgot password? Click here to reset