Learning to Play Two-Player Perfect-Information Games without Knowledge

by   Quentin Cohen-Solal, et al.

In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is another variant of unbounded minimax, which plays the safest action instead of playing the best action. This modified search is intended to be used after the learning process. The five is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 2.0 with reinforcement learning from self-play without knowledge. At Hex size 11 (without swap), the program-player reaches the level of Mohex 3HNN.


page 1

page 2

page 3

page 4


Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information

We introduce a new virtual environment for simulating a card game known ...

HEX and Neurodynamic Programming

Hex is a complex game with a high branching factor. For the first time H...

Scaling Directed Controller Synthesis via Reinforcement Learning

Directed Controller Synthesis technique finds solutions for the non-bloc...

Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning

The game of Chinese Checkers is a challenging traditional board game of ...

Automated Synthesis of Steady-State Continuous Processes using Reinforcement Learning

Automated flowsheet synthesis is an important field in computer-aided pr...

Completeness of Unbounded Best-First Game Algorithms

In this article, we prove the completeness of the following game search ...

SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go (extended version)

We develop a new model that can be applied to any perfect information tw...

Please sign up or login with your details

Forgot password? Click here to reset