Differentiable MPC for End-to-end Planning and Control

10/31/2018
by   Brandon Amos, et al.
0

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

READ FULL TEXT
research
09/22/2022

Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation

Despite decades of research, existing navigation systems still face real...
research
08/15/2019

Model-based Lookahead Reinforcement Learning

Model-based Reinforcement Learning (MBRL) allows data-efficient learning...
research
01/07/2020

Infinite-Horizon Differentiable Model Predictive Control

This paper proposes a differentiable linear quadratic Model Predictive C...
research
08/03/2023

End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear MPC

(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic...
research
11/08/2021

A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

We present a careful comparison of two model-free control algorithms, Ev...
research
11/09/2018

Sample-Efficient Policy Learning based on Completely Behavior Cloning

Direct policy search is one of the most important algorithm of reinforce...
research
06/29/2017

Path Integral Networks: End-to-End Differentiable Optimal Control

In this paper, we introduce Path Integral Networks (PI-Net), a recurrent...

Please sign up or login with your details

Forgot password? Click here to reset