Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

05/26/2021
by   Zaiwei Chen, et al.
0

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of 𝒪(ϵ^-3), outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs n-step TD-learning algorithm with a properly chosen n. We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of 𝒪(1/T) after T iterations. Combining the finite sample error bounds of actor and the critic, we obtain the 𝒪(ϵ^-3) sample complexity. We derive our sample complexity bounds solely based on the assumption that the behavior policy sufficiently explores all the states and actions, which is a much lighter assumption compared to the related literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Actor-critic (AC) methods have exhibited great empirical success compare...
research
02/18/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

In this paper, we provide finite-sample convergence guarantees for an of...
research
02/06/2019

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Though the convergence of major reinforcement learning algorithms has be...
research
03/08/2023

Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games

We introduce a class of networked Markov potential games where agents ar...
research
11/04/2021

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate o...
research
06/02/2022

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

Natural actor-critic (NAC) and its variants, equipped with the represent...
research
06/02/2021

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

In this paper, we study the convergence properties of off-policy policy ...

Please sign up or login with your details

Forgot password? Click here to reset