Schedule Based Temporal Difference Algorithms

11/23/2021
by   Rohan Deb, et al.
0

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD(λ) is a popular class of algorithms to solve this problem. However, the weights assigned to different n-step returns in TD(λ), controlled by the parameter λ, decrease exponentially with increasing n. In this paper, we present a λ-schedule procedure that generalizes the TD(λ) algorithm to the case when the parameter λ could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different n-step returns by choosing a sequence {λ_t}_t ≥ 1. Based on this procedure, we propose an on-policy algorithm - TD(λ)-schedule, and two off-policy algorithms - GTD(λ)-schedule and TDC(λ)-schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2020

Mission schedule of agile satellites based on Proximal Policy Optimization Algorithm

Mission schedule of satellites is an important part of space operation n...
research
11/27/2021

Computational simulation and the search for a quantitative description of simple reinforcement schedules

We aim to discuss schedules of reinforcement in its theoretical and prac...
research
05/21/2017

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Reinforcement Learning (RL) can model complex behavior policies for goal...
research
09/22/2022

Equitable Marketplace Mechanism Design

We consider a trading marketplace that is populated by traders with dive...
research
03/31/2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

We present for the first time an asymptotic convergence analysis of two ...
research
02/10/2018

Beyond the One Step Greedy Approach in Reinforcement Learning

The famous Policy Iteration algorithm alternates between policy improvem...
research
03/27/2020

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

We present a distributional approach to theoretical analyses of reinforc...

Please sign up or login with your details

Forgot password? Click here to reset