Taylor TD-learning

02/27/2023
by   Michele Garibbo, et al.
0

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn a critic. However, TD-learning updates can be high variance due to their sole reliance on Monte Carlo estimates of the updates. Here, we introduce a model-based RL framework, Taylor TD, which reduces this variance. Taylor TD uses a first-order Taylor series expansion of TD updates. This expansion allows to analytically integrate over stochasticity in the action-choice, and some stochasticity in the state distribution for the initial state and action of each TD update. We include theoretical and empirical evidence of Taylor TD updates being lower variance than (standard) TD updates. Additionally, we show that Taylor TD has the same stable learning guarantees as (standard) TD-learning under linear function approximation. Next, we combine Taylor TD with the TD3 algorithm (Fujimoto et al., 2018), into TaTD3. We show TaTD3 performs as well, if not better, than several state-of-the art model-free and model-based baseline algorithms on a set of standard benchmark tasks. Finally, we include further analysis of the settings in which Taylor TD may be most beneficial to performance relative to standard TD-learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Efficient and Robust Reinforcement Learning with Uncertainty-based Value Expansion

By integrating dynamics models into model-free reinforcement learning (R...
research
07/05/2018

Per-decision Multi-step Temporal Difference Learning with Control Variates

Multi-step temporal difference (TD) learning is an important approach in...
research
12/05/2019

Combining Q-Learning and Search with Amortized Value Estimates

We introduce "Search with Amortized Value Estimates" (SAVE), an approach...
research
05/27/2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

In this work, we consider and analyze the sample complexity of model-fre...
research
05/29/2023

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

We present a scalable and effective exploration strategy based on Thomps...
research
10/16/2018

The Concept of Criticality in Reinforcement Learning

Reinforcement learning methods carry a well known bias-variance trade-of...
research
10/02/2018

Sparse Gaussian Process Temporal Difference Learning for Marine Robot Navigation

We present a method for Temporal Difference (TD) learning that addresses...

Please sign up or login with your details

Forgot password? Click here to reset