Ordinal Monte Carlo Tree Search

01/14/2019
by   Tobias Joppen, et al.
0

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Ordinal Bucketing for Game Trees using Dynamic Quantile Approximation

In this paper, we present a simple and cheap ordinal bucketing algorithm...
research
07/17/2018

Preference-Based Monte Carlo Tree Search

Monte Carlo tree search (MCTS) is a popular choice for solving sequentia...
research
05/06/2019

Deep Ordinal Reinforcement Learning

Reinforcement learning usually makes use of numerical rewards, which hav...
research
03/13/2018

Active Reinforcement Learning with Monte-Carlo Tree Search

Active Reinforcement Learning (ARL) is a twist on RL where the agent obs...
research
06/10/2020

Rinascimento: using event-value functions for playing Splendor

In the realm of games research, Artificial General Intelligence algorith...
research
03/27/2018

Accelerating Empowerment Computation with UCT Tree Search

Models of intrinsic motivation present an important means to produce sen...

Please sign up or login with your details

Forgot password? Click here to reset