Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

11/19/2010
by   Bruno Scherrer, et al.
0

We investigate projection methods, for evaluating a linear approximation of the value function of a policy in a Markov Decision Process context. We consider two popular approaches, the one-step Temporal Difference fix-point computation (TD(0)) and the Bellman Residual (BR) minimization. We describe examples, where each method outperforms the other. We highlight a simple relation between the objective function they minimize, and show that while BR enjoys a performance guarantee, TD(0) does not in general. We then propose a unified view in terms of oblique projections of the Bellman equation, which substantially simplifies and extends the characterization of (schoknecht,2002) and the recent analysis of (Yu & Bertsekas, 2008). Eventually, we describe some simulations that suggest that if the TD(0) solution is usually slightly better than the BR solution, its inherent numerical instability makes it very bad in some cases, and thus worse on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2020

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...
research
06/06/2018

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used t...
research
01/22/2013

Properties of the Least Squares Temporal Difference learning algorithm

This paper presents four different ways of looking at the well-known Lea...
research
05/13/2014

Rate of Convergence and Error Bounds for LSTD(λ)

We consider LSTD(λ), the least-squares temporal-difference algorithm wit...
research
07/29/2023

First-order Policy Optimization for Robust Policy Evaluation

We adopt a policy optimization viewpoint towards policy evaluation for r...
research
06/24/2016

Is the Bellman residual a bad proxy?

This paper aims at theoretically and empirically comparing two standard ...
research
12/13/2015

True Online Temporal-Difference Learning

The temporal-difference methods TD(λ) and Sarsa(λ) form a core part of m...

Please sign up or login with your details

Forgot password? Click here to reset