Correcting Momentum in Temporal Difference Learning

by   Emmanuel Bengio, et al.

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.


page 1

page 2

page 3

page 4


Momentum in Reinforcement Learning

We adapt the optimization's concept of momentum to reinforcement learnin...

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have sh...

Improving Adversarial Transferability with Spatial Momentum

Deep Neural Networks (DNN) are vulnerable to adversarial examples. Altho...

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

Recent progress has been made in understanding optimisation dynamics in ...

UCB Momentum Q-learning: Correcting the bias without forgetting

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...

MTAdam: Automatic Balancing of Multiple Training Loss Terms

When training neural models, it is common to combine multiple loss terms...

Code Repositories



view repo

Please sign up or login with your details

Forgot password? Click here to reset