Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

by   Gandharv Patil, et al.

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O(1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.


page 1

page 2

page 3

page 4


On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference ...

Beating SGD Saturation with Tail-Averaging and Minibatching

While stochastic gradient descent (SGD) is one of the major workhorses i...

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

We consider d-dimensional linear stochastic approximation algorithms (LS...

Anytime Tail Averaging

Tail averaging consists in averaging the last examples in a stream. Comm...

Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization

Tail averaging improves on Polyak averaging's non-asymptotic behaviour b...

Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

This paper provides a finite-time analysis of linear stochastic approxim...

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision process...

Please sign up or login with your details

Forgot password? Click here to reset