Maximin Action Identification: A New Bandit Framework for Games

02/15/2016
by   Aurélien Garivier, et al.
0

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed ...
research
06/16/2017

Structured Best Arm Identification with Fixed Confidence

We study the problem of identifying the best action among a set of possi...
research
06/09/2017

Monte-Carlo Tree Search by Best Arm Identification

Recent advances in bandit tools and techniques for sequential learning a...
research
08/09/2014

Bandit Algorithms for Tree Search

Bandit based methods for tree search have recently gained popularity whe...
research
07/22/2021

Bandit Quickest Changepoint Detection

Detecting abrupt changes in temporal behavior patterns is of interest in...
research
02/09/2019

Pure Exploration with Multiple Correct Answers

We determine the sample complexity of pure exploration bandit problems w...
research
11/19/2018

Feature selection as Monte-Carlo Search in Growing Single Rooted Directed Acyclic Graph by Best Leaf Identification

Monte Carlo tree search (MCTS) has received considerable interest due to...

Please sign up or login with your details

Forgot password? Click here to reset