On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

01/17/2022
by   Xiaohong Chen, et al.
0

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the Q-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of Q-function estimation is well-posed in the sense of L^2-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor γ imposed in the recent literature for obtaining the L^2 convergence rates of various Q-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of Q-function and its derivatives in both sup-norm and L^2-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for Q-function but also efficient estimation on the value of any target policy in off-policy settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2021

Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variabilit...
research
12/15/2020

Minimax Risk and Uniform Convergence Rates for Nonparametric Dyadic Regression

Let i=1,…,N index a simple random sample of units drawn from some large ...
research
04/06/2021

Nonparametric needlet estimation for partial derivatives of a probability density function on the d-torus

This paper is concerned with the estimation of the partial derivatives o...
research
02/10/2023

Minimax Instrumental Variable Regression and L_2 Convergence Guarantees without Identification or Closedness

In this paper, we study nonparametric estimation of instrumental variabl...
research
02/23/2017

Sobolev Norm Learning Rates for Regularized Least-Squares Algorithm

Learning rates for regularized least-squares algorithms are in most case...
research
01/08/2019

Monotone Least Squares and Isotonic Quantiles

We consider bivariate observations (X_1,Y_1),...,(X_n,Y_n) such that, co...
research
03/30/2018

Minimax Estimation of Quadratic Fourier Functionals

We study estimation of (semi-)inner products between two nonparametric p...

Please sign up or login with your details

Forgot password? Click here to reset