Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

by   Mingyu Cai, et al.

Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.


page 1

page 15

page 16

page 17


End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Reinforcement Learning (RL) algorithms have found limited success beyond...

Tractable Reinforcement Learning of Signal Temporal Logic Objectives

Signal temporal logic (STL) is an expressive language to specify time-bo...

Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions

Using reinforcement learning to learn control policies is a challenge wh...

Model-based Reinforcement Learning from Signal Temporal Logic Specifications

Techniques based on Reinforcement Learning (RL) are increasingly being u...

Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications

We present a Deep Reinforcement Learning (DRL) algorithm for a task-guid...

Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

Guaranteeing safe behaviour of reinforcement learning (RL) policies pose...

Learning Stable Normalizing-Flow Control for Robotic Manipulation

Reinforcement Learning (RL) of robotic manipulation skills, despite its ...

Please sign up or login with your details

Forgot password? Click here to reset