Sample-Efficient Policy Learning based on Completely Behavior Cloning

11/09/2018
by   Qiming Zou, et al.
0

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2018

Differentiable MPC for End-to-end Planning and Control

We present foundations for using Model Predictive Control (MPC) as a dif...
research
09/15/2021

Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

The successful operation of mobile robots requires them to rapidly adapt...
research
12/07/2021

Tailored neural networks for learning optimal value functions in MPC

Learning-based predictive control is a promising alternative to optimiza...
research
02/24/2021

Safe Learning-based Gradient-free Model Predictive Control Based on Cross-entropy Method

In this paper, a safe and learning-based control framework for model pre...
research
11/28/2019

Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning

Model-based reinforcement learning strategies are believed to exhibit mo...
research
07/15/2020

Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors

In this paper, we present a novel developmental reinforcement learning-b...
research
09/11/2019

MPC-Net: A First Principles Guided Policy Search

We present an Imitation Learning approach for the control of dynamical s...

Please sign up or login with your details

Forgot password? Click here to reset