Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

06/14/2020
by   Chung-Wei Lee, et al.
0

We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. While existing approaches all require carefully constructing optimistic and biased loss estimators, our approach uses standard unbiased estimators and relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality. Besides its simplicity, our approach enjoys several advantages. First, the obtained high-probability regret bounds are data-dependent and could be much smaller than the worst-case bounds, which resolves an open problem asked by Neu (2015). Second, resolving another open problem of Bartlett et al. (2008) and Abernethy and Rakhlin (2009), our approach leads to the first general and efficient algorithm with a high-probability regret bound for adversarial linear bandits, while previous methods are either inefficient or only applicable to specific action sets. Finally, our approach can also be applied to learning adversarial Markov Decision Processes and provides the first algorithm with a high-probability small-loss bound for this problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...
research
11/09/2017

Small-loss bounds for online learning with partial information

We consider the problem of adversarial (non-stochastic) online learning ...
research
08/15/2023

High-Probability Risk Bounds via Sequential Predictors

Online learning methods yield sequential regret bounds under minimal ass...
research
10/08/2018

Limitations of adversarial robustness: strong No Free Lunch Theorem

This manuscript presents some new results on adversarial robustness in m...
research
02/14/2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...
research
02/19/2021

Confidently Comparing Estimators with the c-value

Modern statistics provides an ever-expanding toolkit for estimating unkn...
research
03/18/2022

The price of unfairness in linear bandits with biased feedback

Artificial intelligence is increasingly used in a wide range of decision...

Please sign up or login with your details

Forgot password? Click here to reset