Finite Sample Analyses for TD(0) with Function Approximation

04/04/2017
by   Gal Dalal, et al.
0

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Existing convergence rates for Temporal Difference (TD) methods apply only to somewhat modified versions, e.g., projected variants or ones where stepsizes depend on unknown problem parameters. Our analyses obviate these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. The two are obtained via different approaches that use relatively unknown, recently developed stochastic approximation techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2020

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Greedy-GQ is an off-policy two timescale algorithm for optimal control i...
research
01/15/2015

The Fast Convergence of Incremental PCA

We consider a situation in which we see samples in R^d drawn i.i.d. from...
research
11/20/2019

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Policy evaluation in reinforcement learning is often conducted using two...
research
02/05/2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

We offer a theoretical characterization of off-policy evaluation (OPE) i...
research
08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...
research
12/11/2018

Efficient learning of smooth probability functions from Bernoulli tests with guarantees

We study the fundamental problem of learning an unknown, smooth probabil...
research
10/14/2019

Uniform convergence rates for the approximated halfspace and projection depth

The computational complexity of some depths that satisfy the projection ...

Please sign up or login with your details

Forgot password? Click here to reset