On the Expressivity of Markov Reward

11/01/2021
by   David Abel, et al.
Brown University
Princeton University
15

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/22/2023

On the Expressivity of Multidimensional Markov Reward

We consider the expressivity of Markov rewards in sequential decision ma...
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
09/19/2022

Understanding reinforcement learned crowds

Simulating trajectories of virtual crowds is a commonly encountered task...
08/24/2023

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...
06/02/2022

Learning Soft Constraints From Constrained Expert Demonstrations

Inverse reinforcement learning (IRL) methods assume that the expert data...
10/29/2021

Learning to Be Cautious

A key challenge in the field of reinforcement learning is to develop age...
05/12/2021

Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model based reinforcement learning (MBRL) uses an imperfect model of the...

Please sign up or login with your details

Forgot password? Click here to reset