Learning to Act Safely with Limited Exposure and Almost Sure Certainty

05/18/2021
by   Agustin Castellano, et al.
0

This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials, provided that one is willing to navigate trade-offs between optimality, level of exposure to unsafe events, and the maximum detection time of unsafe actions. We illustrate this concept in two complementary settings. We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty. Under mild assumptions on sufficient exploration, we provide an algorithm that provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. We then consider the problem of finding optimal policies for a Markov Decision Process (MDP) with almost sure constraints. We show that the (action) value function satisfies a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. Using this decomposition, we develop a Barrier-learning algorithm, that identifies such unsafe state-action pairs in a finite expected number of steps. Our analysis further highlights a trade-off between the time lag for the underlying MDP necessary to detect unsafe actions, and the level of exposure to unsafe events. Simulations corroborate our theoretical findings, further illustrating the aforementioned trade-offs, and suggesting that safety constraints can further speed up the learning process.

READ FULL TEXT
research
10/01/2020

Learning to be safe, in finite time

This paper aims to put forward the concept that learning to take safe ac...
research
12/24/2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...
research
01/02/2022

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discre...
research
02/28/2019

Active Exploration in Markov Decision Processes

We introduce the active exploration problem in Markov decision processes...
research
03/01/2020

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

We study the Safe Reinforcement Learning (SRL) problem using the Constra...
research
03/29/2023

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

We illustrate how a calibrated model can help balance common trade-offs ...
research
07/13/2020

On the Effectiveness of Tracking and Testing in SEIR Models

We study the effectiveness of tracking and testing in mitigating or supp...

Please sign up or login with your details

Forgot password? Click here to reset