On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

02/16/2022
by   Ziad Kobeissi, et al.
0

This paper deals with solving continuous time, state and action optimization problems in stochastic settings, using reinforcement learning algorithms, and considers the policy evaluation process. We prove that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, because of the stochastic part. We propose a variance-reduction correction of the temporal difference, leading to new learning algorithms that are stable with respect to vanishing time steps. This allows us to give theoretical guarantees of convergence of our algorithms to the solutions of continuous stochastic optimization problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Stochastic Difference-of-Convex Algorithms for Solving nonconvex optimization problems

The paper deals with stochastic difference-of-convex functions programs,...
research
11/24/2021

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Accurate value estimates are important for off-policy reinforcement lear...
research
06/01/2020

Temporal-Differential Learning in Continuous Environments

In this paper, a new reinforcement learning (RL) method known as the met...
research
02/20/2023

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement lea...
research
11/29/2022

Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Temporal difference (TD) learning is a simple algorithm for policy evalu...
research
01/23/2021

Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art

Safe learning and optimization deals with learning and optimization prob...
research
08/15/2021

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the a...

Please sign up or login with your details

Forgot password? Click here to reset