Nonlinear Sequential Accepts and Rejects for Identification of Top Arms in Stochastic Bandits

07/09/2017
by   Shahin Shahrampour, et al.
0

We address the M-best-arm identification problem in multi-armed bandits. A player has a limited budget to explore K arms (M<K), and once pulled, each arm yields a reward drawn (independently) from a fixed, unknown distribution. The goal is to find the top M arms in the sense of expected reward. We develop an algorithm which proceeds in rounds to deactivate arms iteratively. At each round, the budget is divided by a nonlinear function of remaining arms, and the arms are pulled correspondingly. Based on a decision rule, the deactivated arm at each round may be accepted or rejected. The algorithm outputs the accepted arms that should ideally be the top M arms. We characterize the decay rate of the misidentification probability and establish that the nonlinear budget allocation proves to be useful for different problem environments (described by the number of competitive arms). We provide comprehensive numerical experiments showing that our algorithm outperforms the state-of-the-art using suitable nonlinearity.

READ FULL TEXT
research
09/08/2016

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

We consider the best-arm identification problem in multi-armed bandits, ...
research
02/08/2022

Budgeted Combinatorial Multi-Armed Bandits

We consider a budgeted combinatorial multi-armed bandit setting where, i...
research
06/13/2023

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

We study the problem of best-arm identification (BAI) in the fixed-budge...
research
03/06/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take...
research
11/15/2018

Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs

This paper considers a multi-armed bandit game where the number of arms ...
research
06/09/2021

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

We study the problem of best-arm identification (BAI) in contextual band...
research
05/24/2022

Optimality Conditions and Algorithms for Top-K Arm Identification

We consider the top-k arm identification problem for multi-armed bandits...

Please sign up or login with your details

Forgot password? Click here to reset