Learning Compositional Neural Programs for Continuous Control

07/27/2020
by   Thomas Pierrot, et al.
38

We propose a novel solution to challenging sparse-reward, continuous control problems that require hierarchical planning at multiple levels of abstraction. Our solution, dubbed AlphaNPI-X, involves three separate stages of learning. First, we use off-policy reinforcement learning algorithms with experience replay to learn a set of atomic goal-conditioned policies, which can be easily repurposed for many tasks. Second, we learn self-models describing the effect of the atomic policies on the environment. Third, the self-models are harnessed to learn recursive compositional programs with multiple levels of abstraction. The key insight is that the self-models enable planning by imagination, obviating the need for interaction with the world when learning higher-level compositional programs. To accomplish the third stage of learning, we extend the AlphaNPI algorithm, which applies AlphaZero to learn recursive neural programmer-interpreters. We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks, such as stacking multiple blocks, where powerful model-free baselines fail.

READ FULL TEXT

page 21

page 23

research
06/25/2021

Compositional Reinforcement Learning from Logical Specifications

We study the problem of learning control policies for complex tasks give...
research
06/12/2019

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

The history of learning for control has been an exciting back and forth ...
research
10/23/2022

Active Predictive Coding: A Unified Neural Framework for Learning Hierarchical World Models for Perception and Planning

Predictive coding has emerged as a prominent model of how the brain lear...
research
10/26/2020

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

We study how to learn a policy with compositional generalizability. We p...
research
02/08/2018

Improving the Universality and Learnability of Neural Programmer-Interpreters with Combinator Abstraction

To overcome the limitations of Neural Programmer-Interpreters (NPI) in i...
research
11/19/2019

Planning with Goal-Conditioned Policies

Planning methods can solve temporally extended sequential decision makin...
research
05/30/2019

Learning Compositional Neural Programs with Recursive Tree Search and Planning

We propose a novel reinforcement learning algorithm, AlphaNPI, that inco...

Please sign up or login with your details

Forgot password? Click here to reset