Unsupervised Learning and Exploration of Reachable Outcome Space

09/12/2019
by   Paolo Giuseppe, et al.
0

Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on a population-based divergent-search approach, it learns a set of diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds a repertoire of policies while training an autoencoder on the high-dimensional observation of the final state of the system to build a low-dimensional outcome space. The learned outcome space, combined with the reconstruction error, is used to drive the search for new policies. Results show that TAXONS can find a diverse set of controllers, covering a good part of the ground-truth outcome space, while having no information about such space.

READ FULL TEXT
research
03/02/2022

Learning in Sparse Rewards settings through Quality-Diversity algorithms

In the Reinforcement Learning (RL) framework, the learning is guided thr...
research
11/02/2021

Discovering and Exploiting Sparse Rewards in a Learned Behavior Space

Learning optimal policies in sparse rewards settings is difficult as the...
research
05/31/2022

k-Means Maximum Entropy Exploration

Exploration in high-dimensional, continuous spaces with sparse rewards i...
research
06/01/2018

Being curious about the answers to questions: novelty search with learned attention

We investigate the use of attentional neural network layers in order to ...
research
03/22/2021

Transforming Exploratory Creativity with DeLeNoX

We introduce DeLeNoX (Deep Learning Novelty Explorer), a system that aut...
research
04/10/2021

Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms

Reinforcement learning agents need a reward signal to learn successful p...
research
03/13/2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

Training a generalizable 3D part segmentation network is quite challengi...

Please sign up or login with your details

Forgot password? Click here to reset