Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

by   Kai-Chieh Hsu, et al.

Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See for code and supplementary material.


page 1

page 5

page 6

page 7

page 8


Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

In this paper, we consider the infinite-horizon reach-avoid zero-sum gam...

Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Tasks for autonomous robotic systems commonly require stabilization to a...

A Control-Model-Based Approach for Reinforcement Learning

We consider a new form of model-based reinforcement learning methods tha...

Reach-Avoid Problems via Sum-of-Squares Optimization and Dynamic Programming

Reach-avoid problems involve driving a system to a set of desirable conf...

Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach

In this paper we provide framework to cope with two problems: (i) the fr...

A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets

We propose a novel formulation for approximating reachable sets through ...

VBOC: Learning the Viability Boundary of a Robot Manipulator using Optimal Control

Safety is often the most important requirement in robotics applications....

Please sign up or login with your details

Forgot password? Click here to reset