Lifelong Learning in Multi-Armed Bandits

12/28/2020
by   Matthieu Jedor, et al.
0

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks. While most bandit algorithms are designed to have a low worst-case regret, we examine here the average regret over bandit instances drawn from some prior distribution which may change over time. We specifically focus on confidence interval tuning of UCB algorithms. We propose a bandit over bandit approach with greedy algorithms and we perform extensive experimental evaluations in both stationary and non-stationary environments. We further apply our solution to the mortal bandit problem, showing empirical improvement over previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve fu...
research
12/24/2020

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

The multi-armed bandits' framework is the most common platform to study ...
research
07/05/2022

Linear Jamming Bandits: Sample-Efficient Learning for Non-Coherent Digital Jamming

It has been shown (Amuru et al. 2015) that online learning algorithms ca...
research
12/14/2019

Adapting Behaviour for Learning Progress

Determining what experience to generate to best facilitate learning (i.e...
research
07/17/2018

Continuous Assortment Optimization with Logit Choice Probabilities under Incomplete Information

We consider assortment optimization in relation to a product for which a...
research
01/25/2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

We develop the first general semi-bandit algorithm that simultaneously a...
research
01/31/2022

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Bayesian bandit algorithms with approximate inference have been widely u...

Please sign up or login with your details

Forgot password? Click here to reset