HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints

by   Michael Lutter, et al.

Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose deep optimal feedback control to learn an optimal feedback policy rather than a single trajectory. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we can derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.


Interplanetary Transfers via Deep Representations of the Optimal Policy and/or of the Value Function

A number of applications to interplanetary trajectories have been recent...

Neural Optimal Control using Learned System Dynamics

We study the problem of generating control laws for systems with unknown...

On Constructing the Value Function for Optimal Trajectory Problem and its Application to Image Processing

We proposed an algorithm for solving Hamilton-Jacobi equation associated...

Quantifying the Effect of Feedback Frequency in Interactive Reinforcement Learning for Robotic Tasks

Reinforcement learning (RL) has become widely adopted in robot control. ...

Real-Time Optimal Guidance and Control for Interplanetary Transfers Using Deep Networks

We consider the Earth-Venus mass-optimal interplanetary transfer of a lo...

Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation

Sample-based trajectory optimisers are a promising tool for the control ...

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

The choice of the control frequency of a system has a relevant impact on...

Please sign up or login with your details

Forgot password? Click here to reset