Non-Asymptotic Pure Exploration by Solving Games

06/25/2019
by   Rémy Degenne, et al.
0

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2018

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version)

We study the Combinatorial Pure Exploration problem with Continuous and ...
research
06/18/2022

Thompson Sampling for (Combinatorial) Pure Exploration

Existing methods of combinatorial pure exploration mainly focus on the U...
research
07/02/2020

Structure Adaptive Algorithms for Stochastic Bandits

We study reward maximisation in a wide class of structured stochastic mu...
research
02/21/2020

Double Explore-then-Commit: Asymptotic Optimality and Beyond

We study the two-armed bandit problem with subGaussian rewards. The expl...
research
01/21/2021

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Combinatorial bandits with semi-bandit feedback generalize multi-armed b...
research
06/30/2020

Forced-exploration free Strategies for Unimodal Bandits

We consider a multi-armed bandit problem specified by a set of Gaussian ...
research
07/01/2023

Adaptive Algorithms for Relaxed Pareto Set Identification

In this paper we revisit the fixed-confidence identification of the Pare...

Please sign up or login with your details

Forgot password? Click here to reset