Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

05/18/2023
by   Wenhao Ding, et al.
0

Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL – improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes' theorem. BR-RCRL removes a core obstacle preventing vanilla RCRL from generalizing on high RTG inputs – a tendency that the model treats different RTG inputs as independent values, which we term “RTG Independence". BR-RCRL also allows us to design an accompanying adaptive inference method, which maximizes total returns while avoiding OOD queries that yield unpredictable behaviors in vanilla RCRL methods. We show that BR-RCRL achieves state-of-the-art performance on the Gym-Mujoco and Atari offline RL benchmarks, improving upon vanilla RCRL by up to 11

READ FULL TEXT

page 8

page 13

research
02/24/2021

Information Directed Reward Learning for Reinforcement Learning

For many reinforcement learning (RL) applications, specifying a reward i...
research
01/31/2022

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Recent progress in deep learning has relied on access to large and diver...
research
07/16/2023

Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is an interesting extension...
research
07/20/2021

Offline Preference-Based Apprenticeship Learning

We study how an offline dataset of prior (possibly random) experience ca...
research
05/25/2022

Learning to Query Internet Text for Informing Reinforcement Learning Agents

Generalization to out of distribution tasks in reinforcement learning is...
research
09/12/2023

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline reinforcement learning (RL) holds promise as a means to learn hi...

Please sign up or login with your details

Forgot password? Click here to reset