Rate of Convergence and Error Bounds for LSTD(λ)

by   Manel Tagorti, et al.

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈ (0,1), a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In particular, our analysis sheds some light on the choice of λ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.


page 1

page 2

page 3

page 4


Value Function Approximation in Zero-Sum Markov Games

This paper investigates value function approximation in the context of z...

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...

Stochastic approximation for speeding up LSTD (and LSPI)

We propose a stochastic approximation (SA) based method with randomizati...

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

We derive a new proof to show that the incremental resparsification algo...

Reconstruction of Line-Embeddings of Graphons

Consider a random graph process with n vertices corresponding to points ...

Please sign up or login with your details

Forgot password? Click here to reset