Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

02/13/2021
by   Yifang Chen, et al.
0

We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system. We propose new algorithms which, compared to the existing results in (Lykouris et al., 2020), achieve strictly better regret bounds in terms of total corruptions for the tabular setting. To be specific, firstly, our regret bounds depend on more precise numerical values of total rewards corruptions and transition corruptions, instead of only on the total number of corrupted episodes. Secondly, our regret bounds are the first of their kind in the reinforcement learning setting to have the number of corruptions show up additively with respect to √(T) rather than multiplicatively. Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms. Replacing the meta-algorithm or sub-algorithm may extend the framework to address other corrupted settings with potentially more structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2019

Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision proce...
research
11/20/2019

Corruption Robust Exploration in Episodic Reinforcement Learning

We initiate the study of multi-stage episodic reinforcement learning und...
research
06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...
research
06/07/2019

Reinforcement Learning under Drift

We propose algorithms with state-of-the-art dynamic regret bounds for un...
research
02/25/2021

No-Regret Reinforcement Learning with Heavy-Tailed Rewards

Reinforcement learning algorithms typically assume rewards to be sampled...
research
05/15/2023

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...
research
06/21/2021

Corruption Robust Active Learning

We conduct theoretical studies on streaming-based active learning for bi...

Please sign up or login with your details

Forgot password? Click here to reset