Reinforcement Learning for Task Specifications with Action-Constraints

01/02/2022
by   Arun Raman, et al.
0

In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied. Then we present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints, where we use the development of reward machines to handle the state constraints. We illustrate the method using an example that captures the utility of automata-based methods for non-Markovian state and action specifications for reinforcement learning and show the results of simulations in this setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

Reconnaissance and Planning algorithm for constrained MDP

Practical reinforcement learning problems are often formulated as constr...
research
01/23/2019

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Pro...
research
09/02/2020

A reinforcement learning approach to hybrid control design

In this paper we design hybrid control policies for hybrid systems whose...
research
05/18/2021

Learning to Act Safely with Limited Exposure and Almost Sure Certainty

This paper aims to put forward the concept that learning to take safe ac...
research
11/22/2022

Safe Control and Learning Using Generalized Action Governor

This paper introduces the Generalized Action Governor, which is a superv...
research
04/06/2023

Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

In safe MDP planning, a cost function based on the current state and act...
research
02/15/2021

How RL Agents Behave When Their Actions Are Modified

Reinforcement learning in complex environments may require supervision t...

Please sign up or login with your details

Forgot password? Click here to reset