Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

08/23/2020
by   Shuang Qiu, et al.
0

Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy evaluation has achieved great success in modern reinforcement learning. It is shown that such a problem can be reformulated as a stochastic nonconvex-strongly-concave optimization problem, which is challenging as naive stochastic gradient descent-ascent algorithm suffers from slow convergence. Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data. However, in practice, a single-timescale single-loop stochastic algorithm is preferred due to its simplicity and also because its step-size is easier to tune. In this paper, we propose two single-timescale single-loop algorithms which require only one data point each step. Our first algorithm implements momentum updates on both primal and dual variables achieving an O(ε^-4) sample complexity, which shows the important role of momentum in obtaining a single-timescale algorithm. Our second algorithm improves upon the first one by applying variance reduction on top of momentum, which matches the best known O(ε^-3) sample complexity in existing works. Furthermore, our variance-reduction algorithm does not require a large-batch checkpoint. Moreover, our theoretical results for both algorithms are expressed in a tighter form of simultaneous primal and dual side convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2022

Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization

Temporal difference (TD) learning is a widely used method to evaluate po...
research
05/15/2019

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

We introduce a hybrid stochastic estimator to design stochastic gradient...
research
10/13/2020

Gradient Descent Ascent for Min-Max Problems on Riemannian Manifold

In the paper, we study a class of useful non-convex minimax optimization...
research
02/25/2017

Stochastic Variance Reduction Methods for Policy Evaluation

Policy evaluation is a crucial step in many reinforcement-learning proce...
research
05/26/2015

Optimizing Non-decomposable Performance Measures: A Tale of Two Classes

Modern classification problems frequently present mild to severe label i...
research
12/10/2021

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

Gradient descent ascent (GDA), the simplest single-loop algorithm for no...
research
08/16/2021

Role of New Kernel Function in Complexity Analysis of an Interior Point Algorithm for Semi definite Linear Complementarity Problem

In this paper, we introduce a new kernel function which differs from pre...

Please sign up or login with your details

Forgot password? Click here to reset