Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

06/13/2020
by   Yunhao Tang, et al.
1

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

READ FULL TEXT
research
06/24/2022

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

It has been a recent trend to leverage the power of supervised learning ...
research
07/01/2021

MHER: Model-based Hindsight Experience Replay

Solving multi-goal reinforcement learning (RL) problems with sparse rewa...
research
06/03/2021

Reinforcement Learning as One Big Sequence Modeling Problem

Reinforcement learning (RL) is typically concerned with estimating singl...
research
04/26/2023

Distance Weighted Supervised Learning for Offline Interaction Data

Sequential decision making algorithms often struggle to leverage differe...
research
04/23/2021

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

Can we use reinforcement learning to learn general-purpose policies that...
research
12/04/2020

Planning from Pixels using Inverse Dynamics Models

Learning task-agnostic dynamics models in high-dimensional observation s...
research
07/19/2022

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

As a pivotal component to attaining generalizable solutions in human int...

Please sign up or login with your details

Forgot password? Click here to reset