Posterior Sampling for Continuing Environments

by   Wanqiao Xu, et al.
Stanford University

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach maintains a statistically plausible model of the environment and follows a policy that maximizes expected γ-discounted return in that model. At each time, with probability 1-γ, the model is replaced by a sample from the posterior distribution over environments. For a suitable schedule of γ, we establish an Õ(τ S √(A T)) bound on the Bayesian regret, where S is the number of environment states, A is the number of actions, and τ denotes the reward averaging time, which is a bound on the duration required to accurately estimate the average reward of any policy.


page 1

page 2

page 3

page 4


(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poo...

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

We study the problem of reinforcement learning for a task encoded by a r...

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

We design a simple reinforcement learning agent that, with a specificati...

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

In this work, we propose a novel Kernelized Stein Discrepancy-based Post...

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

In many environments only a tiny subset of all states yield high reward....

Please sign up or login with your details

Forgot password? Click here to reset