Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

by   Yagiz Savas, et al.

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the continual exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm to synthesize a policy that maximizes the entropy of an MDP. The proposed algorithm is based on a convex optimization problem and runs in time polynomial in the size of the MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate that as the restrictions imposed on the paths by a specification increase, the maximum entropy decreases, which in turn, increases the predictability of paths.


page 1

page 2

page 3

page 4


LTL-Constrained Steady-State Policy Synthesis

Decision-making policies for agents are often synthesized with the const...

Implementation and Comparison of Solution Methods for Decision Processes with Non-Markovian Rewards

This paper examines a number of solution methods for decision processes ...

Provably Efficient Maximum Entropy Exploration

Suppose an agent is in a (possibly unknown) Markov decision process (MDP...

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

This paper investigates the motion planning of autonomous dynamical syst...

LTLf Synthesis on Probabilistic Systems

Many systems are naturally modeled as Markov Decision Processes (MDPs), ...

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

This paper addresses the problem of learning control policies for mobile...

Controller Synthesis for Omega-Regular and Steady-State Specifications

Given a Markov decision process (MDP) and a linear-time (ω-regular or LT...

Please sign up or login with your details

Forgot password? Click here to reset