Dynamics of Temporal Difference Reinforcement Learning

by   Blake Bordelon, et al.
Harvard University

Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.


page 1

page 2

page 3

page 4


Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Inverse reinforcement learning (IRL) aims to estimate the reward functio...

Factors of Influence of the Overestimation Bias of Q-Learning

We study whether the learning rate α, the discount factor γ and the rewa...

Learning Dynamics and Generalization in Reinforcement Learning

Solving a reinforcement learning (RL) problem poses two competing challe...

A Theoretical Connection Between Statistical Physics and Reinforcement Learning

Sequential decision making in the presence of uncertainty and stochastic...

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on ...

Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning

An agent's ability to reuse solutions to previously solved problems is c...

Please sign up or login with your details

Forgot password? Click here to reset