Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

06/19/2023
by   Yunfan Li, et al.
0

The success of reinforcement learning heavily relies on the function approximation of policy, value or models, where misspecification (a mismatch between the ground-truth and best function approximators) naturally occurs especially when the ground-truth is complex. As misspecification error does not vanish even with infinite number of samples, designing algorithms that are robust under misspecification is of paramount importance. Recently, it is shown that policy-based approaches can be robust even when the policy function approximation is under a large locally-bounded misspecification error, with which the function class may have Ω(1) approximation error in certain states and actions but is only small on average under a policy-induced state-distribution; whereas it is only known that value-based approach can effectively learn under globally-bounded misspecification error, i.e., the approximation errors to value functions have a uniform upper bound on all state-actions. Yet it remains an open question whether similar robustness can be achieved with value-based approaches. In this paper, we answer this question affirmatively by showing that the algorithm, Least-Square-Value-Iteration [Jin et al, 2020], with carefully designed exploration bonus can achieve robustness under local misspecification error bound. In particular, we show that algorithm achieves a regret bound of O(√(d^3KH^4) + dKH^2ζ), where d is the dimension of linear features, H is the length of the episode, K is the total number of episodes, and ζ is the local bound of the misspecification error. Moreover, we show that the algorithm can achieve the same regret bound without knowing ζ and can be used as robust policy evaluation oracle that can be applied to improve sample complexity in policy-based approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2019

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical prob...
research
02/17/2020

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

The current paper studies the problem of agnostic Q-learning with functi...
research
06/15/2021

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the...
research
11/01/2019

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...
research
06/14/2021

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...
research
08/25/2020

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

This paper studies the robustness aspect of reinforcement learning algor...
research
05/19/2020

Robust Policy Iteration for Continuous-time Linear Quadratic Regulation

This paper studies the robustness of policy iteration in the context of ...

Please sign up or login with your details

Forgot password? Click here to reset