Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

06/11/2023
by   Yuhang Ran, et al.
0

We consider the problem of learning the best possible policy from a fixed dataset, known as offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by distribution or support of the behavior policy. However, distribution and support constraints are overly conservative since they both force the policy to choose similar actions as the behavior policy when considering particular states. It will limit the learned policy's performance, especially when the behavior policy is sub-optimal. In this paper, we find that regularizing the policy towards the nearest state-action pair can be more effective and thus propose Policy Regularization with Dataset Constraint (PRDC). When updating the policy in a given state, PRDC searches the entire dataset for the nearest state-action sample and then restricts the policy with the action of this sample. Unlike previous works, PRDC can guide the policy with proper behaviors from the dataset, allowing it to choose actions that do not appear in the dataset along with the given state. It is a softer constraint but still keeps enough conservatism from out-of-distribution actions. Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL's fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https://github.com/LAMDA-RL/PRDC

READ FULL TEXT

page 7

page 8

page 16

page 17

research
02/22/2023

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where exist...
research
11/15/2022

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning ...
research
11/02/2022

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Offline reinforcement learning (RL) learns policies entirely from static...
research
02/08/2020

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned pol...
research
07/01/2021

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Recent advance in deep offline reinforcement learning (RL) has made it p...
research
06/08/2023

Offline Prioritized Experience Replay

Offline reinforcement learning (RL) is challenged by the distributional ...
research
09/29/2022

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

In offline reinforcement learning, weighted regression is a common metho...

Please sign up or login with your details

Forgot password? Click here to reset