No-Regret Reinforcement Learning with Value Function Approximation: a Kernel Embedding Approach

by   Sayak Ray Chowdhury, et al.

We consider the regret minimisation problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the probability transition model or a functional approximation of Q functions. However, the understanding of function approximation schemes for state value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns representations of transition distributions as embeddings in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order Õ(Hγ_N√(N))[ Õ(·) hides only absolute constant and poly-logarithmic factors], where H is the episode length, N is the total number of time steps and γ_N is an information theoretic quantity relating the effective dimension of the state-action feature space. Our method bypasses the need for estimating transition probabilities and applies to any domain on which kernels can be defined. It also brings new insights into the general theory of kernel methods for approximate inference and RL regret minimization.


page 1

page 2

page 3

page 4


Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Exploration in reinforcement learning (RL) suffers from the curse of dim...

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Reinforcement learning (RL) has shown empirical success in various real ...

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning (RL) for episodic Markov dec...

Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Most existing theoretical analysis of reinforcement learning (RL) is lim...

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

We propose a new learning framework that captures the tiered structure o...

Understanding Deep Neural Function Approximation in Reinforcement Learning via ε-Greedy Exploration

This paper provides a theoretical study of deep neural function approxim...

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...

Please sign up or login with your details

Forgot password? Click here to reset