Q-learning with Nearest Neighbors

02/12/2018
by   Devavrat Shah, et al.
0

We consider the problem of model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernels, when only a single sample path of the system is available. We focus on the classical approach of Q-learning where the goal is to learn the optimal Q-function. We propose the Nearest Neighbor Q-Learning approach that utilizes nearest neighbor regression method to learn the Q function. We provide finite sample analysis of the convergence rate using this method. In particular, we establish that the algorithm is guaranteed to output an ϵ-accurate estimate of the optimal Q-function with high probability using a number of observations that depends polynomially on ϵ and the model parameters. To establish our results, we develop a robust version of stochastic approximation results; this may be of interest in its own right.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro