Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

06/01/2022
by   Wonjoon Goo, et al.
0

We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a policy from selecting unobserved actions, since the consequence of these actions cannot be presumed without additional information about the environment. One straightforward way to implement such a constraint is to explicitly model a given data distribution via behavior cloning and directly force a policy not to select uncertain actions. However, many offline RL methods instantiate the constraint indirectly – for example, pessimistic value estimation – due to a concern about errors when modeling a potentially complex behavior policy. In this work, we argue that it is not only viable but beneficial to explicitly model the behavior policy for offline RL because the constraint can be realized in a stable way with the trained model. We first suggest a theoretical framework that allows us to incorporate behavior-cloned models into value-based offline RL methods, enjoying the strength of both explicit behavior cloning and value learning. Then, we propose a practical method utilizing a score-based generative model for behavior cloning. With the proposed method, we show state-of-the-art performance on several datasets within the D4RL and Robomimic benchmarks and achieve competitive performance across all datasets tested.

READ FULL TEXT
research
06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
09/29/2022

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

In offline reinforcement learning, weighted regression is a common metho...
research
03/17/2021

Regularized Behavior Value Estimation

Offline reinforcement learning restricts the learning process to rely on...
research
11/02/2022

Dual Generator Offline Reinforcement Learning

In offline RL, constraining the learned policy to remain close to the da...
research
04/04/2022

Capturing positive utilities during the estimation of recursive logit models: A prism-based approach

Although the recursive logit (RL) model has been recently popular and ha...
research
11/02/2022

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Offline reinforcement learning (RL) learns policies entirely from static...
research
07/12/2023

Budgeting Counterfactual for Offline RL

The main challenge of offline reinforcement learning, where data is limi...

Please sign up or login with your details

Forgot password? Click here to reset