Reconciling Rewards with Predictive State Representations

06/07/2021
by   Andrea Baisero, et al.
0

Predictive state representations (PSRs) are models of controlled non-Markov observation sequences which exhibit the same generative process governing POMDP observations without relying on an underlying latent state. In that respect, a PSR is indistinguishable from the corresponding POMDP. However, PSRs notoriously ignore the notion of rewards, which undermines the general utility of PSR models for control, planning, or reinforcement learning. Therefore, we describe a sufficient and necessary accuracy condition which determines whether a PSR is able to accurately model POMDP rewards, we show that rewards can be approximated even when the accuracy condition is not satisfied, and we find that a non-trivial number of POMDPs taken from a well-known third-party repository do not satisfy the accuracy condition. We propose reward-predictive state representations (R-PSRs), a generalization of PSRs which accurately models both observations and rewards, and develop value iteration for R-PSRs. We show that there is a mismatch between optimal POMDP policies and the optimal PSR policies derived from approximate rewards. On the other hand, optimal R-PSR policies perfectly match optimal POMDP policies, reconfirming R-PSRs as accurate state-less generative models of observations and rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Defense Against Reward Poisoning Attacks in Reinforcement Learning

We study defense strategies against reward poisoning attacks in reinforc...
research
06/05/2018

Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

Exogenous state variables and rewards can slow down reinforcement learni...
research
07/09/2021

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

The success of reinforcement learning in typical settings is, in part, p...
research
09/17/2018

Adversarial Imitation via Variational Inverse Reinforcement Learning

We consider a problem of learning a reward and policy from expert exampl...
research
02/21/2021

Delayed Rewards Calibration via Reward Empirical Sufficiency

Appropriate credit assignment for delay rewards is a fundamental challen...
research
03/03/2021

Successor Feature Sets: Generalizing Successor Representations Across Policies

Successor-style representations have many advantages for reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset