Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

11/01/2019
by   Andrea Zanette, et al.
0

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where exploration is induced by perturbing the least-squares approximation of the action-value function. Under the assumption that the Markov decision process has low-rank transition dynamics, we prove that the frequentist regret of RLSVI is upper-bounded by O(d^2 H^2 √(T)) where d are the feature dimension, H is the horizon, and T is the total number of steps. To the best of our knowledge, this is the first frequentist regret analysis for randomized exploration with function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the...
research
10/23/2020

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

This paper studies regret minimization with randomized value functions i...
research
12/12/2019

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous succe...
research
11/16/2020

No-Regret Reinforcement Learning with Value Function Approximation: a Kernel Embedding Approach

We consider the regret minimisation problem in reinforcement learning (R...
research
05/15/2018

Feedback-Based Tree Search for Reinforcement Learning

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a numb...
research
06/19/2023

Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

The success of reinforcement learning heavily relies on the function app...
research
10/15/2018

Successor Uncertainties: exploration and uncertainty in temporal difference learning

We consider the problem of balancing exploration and exploitation in seq...

Please sign up or login with your details

Forgot password? Click here to reset