Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

by   Scott Fujimoto, et al.

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.


page 5

page 13

page 20

page 21

page 22

page 23


Deep RBF Value Functions for Continuous Control

A core operation in reinforcement learning (RL) is finding an action tha...

High-confidence error estimates for learned value functions

Estimating the value function for a fixed policy is a fundamental proble...

Reducing Sampling Error in Batch Temporal Difference Learning

Temporal difference (TD) learning is one of the main foundations of mode...

Approximation of the value function for optimal control problems on stratified domains

In optimal control problems defined on stratified domains, the dynamics ...

Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes

Value function approximation is a crucial module for policy evaluation i...

Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning

We study the dynamics of temporal-difference learning with neural networ...

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Data valuation, or the valuation of individual datum contributions, has ...

Please sign up or login with your details

Forgot password? Click here to reset