Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback

02/27/2019
by   Yuko Kuroki, et al.
0

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

In this paper, we propose the novel model of combinatorial pure explorat...
research
06/23/2020

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandi...
research
07/15/2021

A unified framework for bandit multiple testing

In bandit multiple hypothesis testing, each arm corresponds to a differe...
research
12/08/2021

A Fast Algorithm for PAC Combinatorial Pure Exploration

We consider the problem of Combinatorial Pure Exploration (CPE), which d...
research
02/24/2021

Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions

In this paper, we study the Combinatorial Pure Exploration problem with ...
research
10/16/2017

Fully adaptive algorithm for pure exploration in linear bandits

We propose the first fully-adaptive algorithm for pure exploration in li...
research
05/08/2021

Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions

In this paper, we study the pure exploration bandit model on general dis...

Please sign up or login with your details

Forgot password? Click here to reset