On the Expressivity of Markov Reward

by   David Abel, et al.
Brown University
Princeton University

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.


page 1

page 2

page 3

page 4


On the Expressivity of Multidimensional Markov Reward

We consider the expressivity of Markov rewards in sequential decision ma...

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...

Understanding reinforcement learned crowds

Simulating trajectories of virtual crowds is a commonly encountered task...

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...

Learning Soft Constraints From Constrained Expert Demonstrations

Inverse reinforcement learning (IRL) methods assume that the expert data...

Learning to Be Cautious

A key challenge in the field of reinforcement learning is to develop age...

Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model based reinforcement learning (MBRL) uses an imperfect model of the...

Please sign up or login with your details

Forgot password? Click here to reset