Training Transition Policies via Distribution Matching for Complex Tasks

by   Ju-Seung Byun, et al.

Humans decompose novel complex tasks into simpler ones to exploit previously learned skills. Analogously, hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones. However, because each lower-level policy induces a different distribution of states, transitioning from one lower-level policy to another may fail due to an unexpected starting state. We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy. Training transition policies is challenging because the natural reward signal – whether the next policy can execute its subtask successfully – is sparse. By training transition policies via adversarial inverse reinforcement learning to match the distribution of expected states and actions, we avoid relying on task-based reward. To further improve performance, we use deep Q-learning with a binary action space to determine when to switch from a transition policy to the next pre-trained policy, using the success or failure of the next subtask as the reward. Although the reward is still sparse, the problem is less severe due to the simple binary action space. We demonstrate our method on continuous bipedal locomotion and arm manipulation tasks that require diverse skills. We show that it smoothly connects the lower-level policies, achieving higher success rates than previous methods that search for successful trajectories based on a reward function, but do not match the state distribution.


page 1

page 2

page 3

page 4


Learning Novel Policies For Tasks

In this work, we present a reinforcement learning algorithm that can fin...

Latent Space Policies for Hierarchical Reinforcement Learning

We address the problem of learning hierarchical deep neural network poli...

Learning Setup Policies: Reliable Transition Between Locomotion Behaviours

Dynamic platforms that operate over manyunique terrain conditions typica...

Ranking Policy Decisions

Policies trained via Reinforcement Learning (RL) are often needlessly co...

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

A long-standing goal of reinforcement learning is that algorithms can le...

Mesh Based Analysis of Low Fractal Dimension ReinforcementLearning Policies

In previous work, using a process we call meshing, the reachable state s...

Variational Policy Search using Sparse Gaussian Process Priors for Learning Multimodal Optimal Actions

Policy search reinforcement learning has been drawing much attention as ...

Please sign up or login with your details

Forgot password? Click here to reset