PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

by   Guillaume Matheron, et al.

The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.


page 1

page 2

page 3

page 4


Harnessing Reinforcement Learning for Neural Motion Planning

Motion planning is an essential component in most of today's robotic app...

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

This paper introduces a hybrid algorithm of deep reinforcement learning ...

Learning Coordinated Tasks using Reinforcement Learning in Humanoids

With the advent of artificial intelligence and machine learning, humanoi...

Bimanual Regrasping for Suture Needles using Reinforcement Learning for Rapid Motion Planning

Regrasping a suture needle is an important process in suturing, and prev...

Sub-Goal Trees – a Framework for Goal-Based Reinforcement Learning

Many AI problems, in robotics and other domains, are goal-based, essenti...

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

Bayesian Reinforcement Learning (RL) is capable of not only incorporatin...

Please sign up or login with your details

Forgot password? Click here to reset