Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

by   Simon S. Du, et al.

The current paper studies the problem of agnostic Q-learning with function approximation in deterministic systems where the optimal Q-function is approximable by a function in the class F with approximation error δ> 0. We propose a novel recursion-based algorithm and show that if δ = O(ρ/√(_E)), then one can find the optimal policy using O(_E) trajectories, where ρ is the gap between the optimal Q-value of the best actions and that of the second-best actions and _E is the Eluder dimension of F. Our result has two implications: 1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition δ = Θ(ρ/√(dim_E)) is necessary and sufficient for algorithms with polynomial sample complexity. 2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity Θ(dim_E) is tight even in the agnostic setting. Therefore, we settle the open problem on agnostic Q-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.


page 1

page 2

page 3

page 4


Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

We investigate the problem of best-policy identification in discounted M...

Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity

This paper studies the robustness of data valuation to noisy model perfo...

Tight bounds for learning a mixture of two gaussians

We consider the problem of identifying the parameters of an unknown mixt...

Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

The success of reinforcement learning heavily relies on the function app...

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

We focus on the task of learning a single index model σ(w^⋆· x) with res...

Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search

Greedy best-first search (GBFS) and A* search (A*) are popular algorithm...

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

In stochastic zeroth-order optimization, a problem of practical relevanc...

Please sign up or login with your details

Forgot password? Click here to reset