Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

by   Yunfan Li, et al.

The success of reinforcement learning heavily relies on the function approximation of policy, value or models, where misspecification (a mismatch between the ground-truth and best function approximators) naturally occurs especially when the ground-truth is complex. As misspecification error does not vanish even with infinite number of samples, designing algorithms that are robust under misspecification is of paramount importance. Recently, it is shown that policy-based approaches can be robust even when the policy function approximation is under a large locally-bounded misspecification error, with which the function class may have Ω(1) approximation error in certain states and actions but is only small on average under a policy-induced state-distribution; whereas it is only known that value-based approach can effectively learn under globally-bounded misspecification error, i.e., the approximation errors to value functions have a uniform upper bound on all state-actions. Yet it remains an open question whether similar robustness can be achieved with value-based approaches. In this paper, we answer this question affirmatively by showing that the algorithm, Least-Square-Value-Iteration [Jin et al, 2020], with carefully designed exploration bonus can achieve robustness under local misspecification error bound. In particular, we show that algorithm achieves a regret bound of O(√(d^3KH^4) + dKH^2ζ), where d is the dimension of linear features, H is the length of the episode, K is the total number of episodes, and ζ is the local bound of the misspecification error. Moreover, we show that the algorithm can achieve the same regret bound without knowing ζ and can be used as robust policy evaluation oracle that can be applied to improve sample complexity in policy-based approaches.


page 1

page 2

page 3

page 4


Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical prob...

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

The current paper studies the problem of agnostic Q-learning with functi...

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the...

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

This paper studies the robustness aspect of reinforcement learning algor...

Robust Policy Iteration for Continuous-time Linear Quadratic Regulation

This paper studies the robustness of policy iteration in the context of ...

Please sign up or login with your details

Forgot password? Click here to reset