Variance-Dependent Best Arm Identification

06/19/2021
by   Pinyan Lu, et al.
0

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of n arms indexed from 1 to n, each arm i is associated with an unknown reward distribution supported on [0,1] with mean θ_i and variance σ_i^2. Assume θ_1 > θ_2 ≥⋯≥θ_n. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called grouped median elimination. The proposed algorithm guarantees to output the best arm with probability (1-δ) and uses at most O (∑_i = 1^n (σ_i^2/Δ_i^2 + 1/Δ_i)(lnδ^-1 + lnlnΔ_i^-1)) samples, where Δ_i (i ≥ 2) denotes the reward gap between arm i and the best arm and we define Δ_1 = Δ_2. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra ln n factor on the best arm compared with the state-of-the-art. We further show that Ω( ∑_i = 1^n ( σ_i^2/Δ_i^2 + 1/Δ_i) lnδ^-1) samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset