Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

by   Pierre-Alexandre Kamienny, et al.

Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage. 2) We optimize policies by maximizing their number under the constraint that each of them reaches distinct regions of the environment (i.e., they are sufficiently discriminable) and prove that this serves as a lower bound to the original mutual information objective. 3) Finally, we compose the learned directed skills into a growing tree that adaptively covers the environment. We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.


page 9

page 22

page 23

page 25


Behavior Contrastive Learning for Unsupervised Skill Discovery

In reinforcement learning, unsupervised skill discovery aims to learn di...

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsup...

Wasserstein Distance Maximizing Intrinsic Control

This paper deals with the problem of learning a skill-conditioned policy...

Wasserstein Unsupervised Reinforcement Learning

Unsupervised reinforcement learning aims to train agents to learn a hand...

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

Acquiring abilities in the absence of a task-oriented reward function is...

Open-Ended Reinforcement Learning with Neural Reward Functions

Inspired by the great success of unsupervised learning in Computer Visio...

One Big Net For Everything

I apply recent work on "learning to think" (2015) and on PowerPlay (2011...

Please sign up or login with your details

Forgot password? Click here to reset