Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

02/04/2020
by   Maxim Kaledin, et al.
0

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is o(1/k^c) and the steady-state term is O(1/k), where c>1 and k is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of Ω(1/k). A simple numerical experiment is presented to support our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Finite-Time Analysis of Q-Learning with Linear Function Approximation

In this paper, we consider the model-free reinforcement learning problem...
research
09/21/2018

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

In reinforcement learning (RL) , one of the key components is policy eva...
research
02/03/2019

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

We consider the dynamics of a linear stochastic approximation algorithm ...
research
07/10/2022

Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

This paper provides a finite-time analysis of linear stochastic approxim...
research
10/03/2022

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

We consider Linear Stochastic Approximation (LSA) with a constant stepsi...
research
11/20/2019

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Policy evaluation in reinforcement learning is often conducted using two...
research
11/15/2020

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

The focus of this paper is on stochastic variational inequalities (VI) u...

Please sign up or login with your details

Forgot password? Click here to reset