Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

by   Aditya Mate, et al.

Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account for the arrival of new patients and the departure of enrolled patients from a treatment program. To address this challenge, we formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs where heterogeneous arms arrive and leave under possibly random streams. We propose a new and scalable approach to computing index-based solutions. We start by proving that index values decrease for short residual lifetimes, a phenomenon that we call index decay. We then provide algorithms designed to capture index decay without having to solve the costly finite horizon problem, thereby lowering the computational complexity compared to existing methods.We evaluate our approach via simulations run on real-world data obtained from a tuberculosis intervention planning task as well as multiple other synthetic domains. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance. These findings are robust across multiple domains.


page 7

page 11


Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

We consider finite state restless multi-armed bandit problem. The decisi...

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I analyse the frequentist regret of the famous Gittins index strategy fo...

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

We consider nonstationary multi-armed bandit problems where the model pa...

Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks

Motivated by applications such as machine repair, project monitoring, an...

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

The success of many healthcare programs depends on participants' adheren...

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

Multi-armed bandit (MAB) algorithms are efficient approaches to reduce t...

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

In many public health settings, it is important for patients to adhere t...

Please sign up or login with your details

Forgot password? Click here to reset