CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

10/19/2022
by   Abdus Salam Azad, et al.
0

Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the sampled tasks. This is a non-stationary process where the task distribution evolves along with agent policies, creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. By keeping the task manifold fixed, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments: showing an 18x improvement on the F1 CarRacing benchmark. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, outperforming it in nine of the 20 tracks. CLUTR also achieves a 33 out-of-distribution navigation tasks.

READ FULL TEXT

page 7

page 8

page 9

page 14

page 15

page 19

research
10/31/2022

Teacher-student curriculum learning for reinforcement learning

Reinforcement learning (rl) is a popular paradigm for sequential decisio...
research
12/03/2020

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems - including robustn...
research
03/02/2021

Adversarial Environment Generation for Learning to Navigate the Web

Learning to autonomously navigate the web is a difficult sequential deci...
research
12/30/2022

Reinforcement Learning with Success Induced Task Prioritization

Many challenging reinforcement learning (RL) problems require designing ...
research
08/21/2023

Stabilizing Unsupervised Environment Design with a Learned Adversary

A key challenge in training generally-capable agents is the design of tr...
research
10/07/2019

Self-Paced Contextual Reinforcement Learning

Generalization and adaptation of learned skills to novel situations is a...
research
07/11/2022

Grounding Aleatoric Uncertainty in Unsupervised Environment Design

Adaptive curricula in reinforcement learning (RL) have proven effective ...

Please sign up or login with your details

Forgot password? Click here to reset