Dynamics-Aware Latent Space Reachability for Exploration in Temporally-Extended Tasks

by   Homanga Bharadhwaj, et al.

Self-supervised goal proposal and reaching is a key component of efficient policy learning algorithms. Such a self-supervised approach without access to any oracle goal sampling distribution requires deep exploration and commitment so that long horizon plans can be efficiently discovered. In this paper, we propose an exploration framework, which learns a dynamics-aware manifold of reachable states. Given a new goal, our proposed method visits a state at the current frontier of reachable states (commitment/reaching) and then explores to reach the goal (exploration). This allocates exploration budget near the frontier of the reachable region instead of its interior. We target the challenging problem of policy learning from initial and goal states specified as images, and do not assume any access to the underlying ground-truth states of the robot and the environment. To keep track of reachable latent states, we propose a distance conditioned reachability network that is trained to infer whether one state is reachable from another within the specified latent space distance. So, given an initial state, we obtain a frontier of reachable states from that state. By incorporating a curriculum for sampling easier goals (closer to the start state) before more difficult goals, we demonstrate that the proposed self-supervised exploration algorithm, can achieve 20 performance on average compared to existing baselines on a set of challenging robotic environments, including on a real robot manipulation task.


page 1

page 2

page 7

page 8


LEAF: Latent Exploration Along the Frontier

Self-supervised goal proposal and reaching is a key component for explor...

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spon...

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

In standard reinforcement learning, each new skill requires a manually-d...

GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

In this work, we first formulate the problem of goal-conditioned robotic...

DeFNet: Deconstructed Fabric Folding Strategy Based on Latent Space Roadmap and Flow-Based Policy

Fabric folding through robots is complex and challenging due to the defo...

Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning

Learning in a multi-target environment without prior knowledge about the...

C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Given a particular embodiment, we propose a novel method (C3PO) that lea...

Please sign up or login with your details

Forgot password? Click here to reset