Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

06/15/2023
by   Shintaro Nakamura, et al.
0

The combinatorial pure exploration (CPE) in the stochastic multi-armed bandit setting (MAB) is a well-studied online decision-making problem: A player wants to find the optimal action π^* from action class 𝒜, which is a collection of subsets of arms with certain combinatorial structures. Though CPE can represent many combinatorial structures such as paths, matching, and spanning trees, most existing works focus only on binary action class 𝒜⊆{0, 1}^d for some positive integer d. This binary formulation excludes important problems such as the optimal transport, knapsack, and production planning problems. To overcome this limitation, we extend the binary formulation to real, 𝒜⊆ℝ^d, and propose a new algorithm. The only assumption we make is that the number of actions in 𝒜 is polynomial in d. We show an upper bound of the sample complexity for our algorithm and the action class-dependent lower bound for R-CPE-MAB, by introducing a quantity that characterizes the problem's difficulty, which is a generalization of the notion width introduced in Chen et al.[2014].

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset