Corruption-Robust Offline Reinforcement Learning

06/11/2021
by   Xuezhou Zhang, et al.
7

We study the adversarial robustness in offline reinforcement learning. Given a batch dataset consisting of tuples (s, a, r, s'), an adversary is allowed to arbitrarily modify ϵ fraction of the tuples. From the corrupted dataset the learner aims to robustly identify a near-optimal policy. We first show that a worst-case Ω(dϵ) optimality gap is unavoidable in linear MDP of dimension d, even if the adversary only corrupts the reward element in a tuple. This contrasts with dimension-free results in robust supervised learning and best-known lower-bound in the online RL setting with corruption. Next, we propose robust variants of the Least-Square Value Iteration (LSVI) algorithm utilizing robust supervised learning oracles, which achieve near-matching performances in cases both with and without full data coverage. The algorithm requires the knowledge of ϵ to design the pessimism bonus in the no-coverage case. Surprisingly, in this case, the knowledge of ϵ is necessary, as we show that being adaptive to unknown ϵ is impossible.This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Recent theoretical work studies sample-efficient reinforcement learning ...
research
06/01/2022

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

We consider a distributed reinforcement learning setting where multiple ...
research
11/09/2022

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) co...
research
09/14/2022

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

Among the reasons hindering reinforcement learning (RL) applications to ...
research
02/11/2021

Robust Policy Gradient against Strong Data Corruption

We study the problem of robust reinforcement learning under adversarial ...
research
02/19/2019

A Random Subspace Technique That Is Resistant to a Limited Number of Features Corrupted by an Adversary

In this paper, we consider batch supervised learning where an adversary ...
research
05/05/2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both co...

Please sign up or login with your details

Forgot password? Click here to reset