Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

by   Shintaro Nakamura, et al.

We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given d stochastic arms, and the reward of each arm s∈{1, …, d} follows an unknown distribution with mean μ_s. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal action π^* = _π∈𝒜μ^⊤π from a finite-sized real-valued action set 𝒜⊂ℝ^d with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set 𝒜 is polynomial in d. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in d. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.


page 1

page 2

page 3

page 4


Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

The combinatorial pure exploration (CPE) in the stochastic multi-armed b...

Thompson Sampling for (Combinatorial) Pure Exploration

Existing methods of combinatorial pure exploration mainly focus on the U...

Differential Good Arm Identification

This paper targets a variant of the stochastic multi-armed bandit proble...

A unified framework for bandit multiple testing

In bandit multiple hypothesis testing, each arm corresponds to a differe...

The Max K-Armed Bandit: A PAC Lower Bound and tighter Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandi...

An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem

This work studies the pure-exploration setting for the convex hull feasi...

Please sign up or login with your details

Forgot password? Click here to reset