Leveraging Offline Data in Online Reinforcement Learning

by   Andrew Wagenmaker, et al.

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an ϵ-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an ϵ-optimal policy? In this work, we consider this setting, which we call the setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, FTPedel, which is provably optimal. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between verifiable learning, the typical setting considered in online RL, and unverifiable learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.


page 1

page 2

page 3

page 4


Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Recent theoretical work studies sample-efficient reinforcement learning ...

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect...

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...

Offline RL With Resource Constrained Online Deployment

Offline reinforcement learning is used to train policies in scenarios wh...

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

Reinforcement learning (RL) has achieved impressive performance in a var...

Corruption-Robust Offline Reinforcement Learning

We study the adversarial robustness in offline reinforcement learning. G...

Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning

Offline reinforcement-learning (RL) algorithms learn to make decisions u...

Please sign up or login with your details

Forgot password? Click here to reset