ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

10/11/2022
by   Tung Nguyen, et al.
0

The goal of offline reinforcement learning (RL) is to learn near-optimal policies from static logged datasets, thus sidestepping expensive online interactions. Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al., 2021; Emmons et al., 2021) have shown that by conditioning on desired future returns, BC can perform competitively to their value-based counterparts, while enjoying much more simplicity and training stability. However, the distribution of returns in the offline dataset can be arbitrarily skewed and suboptimal, which poses a unique challenge for conditioning BC on expert returns at test time. We propose ConserWeightive Behavioral Cloning (CWBC), a simple and effective method for improving the performance of conditional BC for offline RL with two key components: trajectory weighting and conservative regularization. Trajectory weighting addresses the bias-variance tradeoff in conditional BC and provides a principled mechanism to learn from both low return trajectories (typically plentiful) and high return trajectories (typically few). Further, we analyze the notion of conservatism in existing BC methods, and propose a novel conservative regularize that explicitly encourages the policy to stay close to the data distribution. The regularizer helps achieve more reliable performance, and removes the need for ad-hoc tuning of the conditioning value during evaluation. We instantiate CWBC in the context of Reinforcement Learning via Supervised Learning (RvS) (Emmons et al., 2021) and Decision Transformer (DT) (Chen et al., 2021), and empirically show that it significantly boosts the performance and stability of prior methods on various offline RL benchmarks. Code is available at https://github.com/tung-nd/cwbc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
01/31/2023

Skill Decision Transformer

Recent work has shown that Large Language Models (LLMs) can be incredibl...
research
05/31/2022

You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments

Recently, methods such as Decision Transformer that reduce reinforcement...
research
06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...
research
09/12/2023

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline reinforcement learning (RL) holds promise as a means to learn hi...
research
10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...
research
11/28/2022

Is Conditional Generative Modeling all you need for Decision-Making?

Recent improvements in conditional generative modeling have made it poss...

Please sign up or login with your details

Forgot password? Click here to reset