Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

by   Souradip Chakraborty, et al.

In this work, we propose a novel Kernelized Stein Discrepancy-based Posterior Sampling for RL algorithm (named ) which extends model-based RL based upon posterior sampling (PSRL) in several ways: we (i) relax the need for any smoothness or Gaussian assumptions, allowing for complex mixture models; (ii) ensure it is applicable to large-scale training by incorporating a compression step such that the posterior consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) develop a novel regret analysis of PSRL based upon integral probability metrics, which, under a smoothness condition on the constructed posterior, can be evaluated in closed form as the kernelized Stein discrepancy (KSD). Consequently, we are able to improve the š’Ŗ(H^3/2dāˆš(T)) regret of PSRL to š’Ŗ(H^3/2āˆš(T)), where d is the input dimension, H is the episode length, and T is the total number of episodes experienced, alleviating a linear dependence on d . Moreover, we theoretically establish a trade-off between regret rate with posterior representational complexity via introducing a compression budget parameter Ļµ based on KSD, and establish a lower bound on the required complexity for consistency of the model. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, with substantive improvements in computation time. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, and can achieve up-to 50% reduction in wall clock time in some continuous control environments.

āˆ™ 06/01/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...
āˆ™ 01/28/2023

STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

Directed Exploration is a crucial challenge in reinforcement learning (R...
āˆ™ 10/20/2022

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

We propose a model-based lifelong reinforcement-learning approach that e...
āˆ™ 08/04/2019

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

In preference-based reinforcement learning (RL), an agent interacts with...
āˆ™ 11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
āˆ™ 12/28/2021

Exponential Family Model-Based Reinforcement Learning via Score Matching

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-...

Please sign up or login with your details

Forgot password? Click here to reset