Safe Exploration for Constrained Reinforcement Learning with Provable Guarantees

12/01/2021
∙
by   Archana Bura, et al.
∙
0
∙

We consider the problem of learning an episodic safe control policy that minimizes an objective function, while satisfying necessary safety constraints – both during learning and deployment. We formulate this safety constrained reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function. Here, we model the safety requirements as constraints on the expected cumulative costs that must be satisfied during all episodes of learning. We propose a model-based safe RL algorithm that we call the Optimistic-Pessimistic Safe Reinforcement Learning (OPSRL) algorithm, and show that it achieves an 𝒊Ėƒ(S^2√(A H^7K)/ (CĖ… - CĖ…_b)) cumulative regret without violating the safety constraints during learning, where S is the number of states, A is the number of actions, H is the horizon length, K is the number of learning episodes, and (CĖ… - CĖ…_b) is the safety gap, i.e., the difference between the constraint value and the cost of a known safe baseline policy. The scaling as 𝒊Ėƒ(√(K)) is the same as the traditional approach where constraints may be violated during learning, which means that our algorithm suffers no additional regret in spite of providing a safety guarantee. Our key idea is to use an optimistic exploration approach with pessimistic constraint enforcement for learning the policy. This approach simultaneously incentivizes the exploration of unknown states while imposing a penalty for visiting states that are likely to cause violation of safety constraints. We validate our algorithm by evaluating its performance on benchmark problems against conventional approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/08/2023

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically im...
research
∙ 05/20/2018

A Lyapunov-based Approach to Safe Reinforcement Learning

In many real-world reinforcement learning (RL) problems, besides optimiz...
research
∙ 09/29/2022

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

It is quite challenging to ensure the safety of reinforcement learning (...
research
∙ 11/14/2021

Explicit Explore, Exploit, or Escape (E^4): near-optimal safety-constrained reinforcement learning in polynomial time

In reinforcement learning (RL), an agent must explore an initially unkno...
research
∙ 03/06/2019

Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation

An important facet of reinforcement learning (RL) has to do with how the...
research
∙ 09/21/2018

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...
research
∙ 02/07/2020

Safe Wasserstein Constrained Deep Q-Learning

This paper presents a distributionally robust Q-Learning algorithm (DrQ)...

Please sign up or login with your details

Forgot password? Click here to reset