Best Arm Identification in Restless Markov Multi-Armed Bandits

03/29/2022
by   P. N. Karthik, et al.
0

We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed by the arm's transition probability matrix (TPM). A decision entity that knows the set of arm TPMs but not the exact mapping of the TPMs to the arms, wishes to find the index of the best arm as quickly as possible, subject to an upper bound on the error probability. The decision entity selects one arm at a time sequentially, and all the unselected arms continue to undergo state evolution (restless arms). For this problem, we derive the first-known problem instance-dependent asymptotic lower bound on the growth rate of the expected time required to find the index of the best arm, where the asymptotics is as the error probability vanishes. Further, we propose a sequential policy that, for an input parameter R, forcibly selects an arm that has not been selected for R consecutive time instants. We show that this policy achieves an upper bound that depends on R and is monotonically non-increasing as R→∞. The question of whether, in general, the limiting value of the upper bound as R→∞ matches with the lower bound, remains open. We identify a special case in which the upper and the lower bounds match. Prior works on best arm identification have dealt with (a) independent and identically distributed observations from the arms, and (b) rested Markov arms, whereas our work deals with the more difficult setting of restless Markov arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Learning to Detect an Odd Markov Arm

A multi-armed bandit with finitely many arms is studied when each arm is...
research
05/08/2021

Learning to Detect an Odd Restless Markov Arm with a Trembling Hand

This paper studies the problem of finding an anomalous arm in a multi-ar...
research
05/13/2020

Detecting an Odd Restless Markov Arm with a Trembling Hand

In this paper, we consider a multi-armed bandit in which each arm is a M...
research
01/31/2022

Rotting infinitely many-armed bandits

We consider the infinitely many-armed bandit problem with rotting reward...
research
12/11/2017

Optimal Odd Arm Identification with Fixed Confidence

The problem of detecting an odd arm from a set of K arms of a multi-arme...
research
04/07/2012

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) ...
research
10/14/2022

Federated Best Arm Identification with Heterogeneous Clients

We study best arm identification in a federated multi-armed bandit setti...

Please sign up or login with your details

Forgot password? Click here to reset