Finite-time analysis of single-timescale actor-critic
Despite the great empirical success of actor-critic methods, its finite-time convergence is still poorly understood in its most practical form. In particular, the analysis of single-timescale actor-critic presents significant challenges due to the highly inaccurate critic estimation and the complex error propagation dynamics over iterations. Existing works on analyzing single-timescale actor-critic only focus on the i.i.d. sampling or tabular setting for simplicity, which is rarely the case in practical applications. We consider the more practical online single-timescale actor-critic algorithm on continuous state space, where the critic is updated with a single Markovian sample per actor step. We prove that the online single-timescale actor-critic method is guaranteed to find an ϵ-approximate stationary point with 𝒪(ϵ^-2) sample complexity under standard assumptions, which can be further improved to 𝒪(ϵ^-2) under i.i.d. sampling. Our analysis develops a novel framework that evaluates and controls the error propagation between actor and critic in a systematic way. To our knowledge, this is the first finite-time analysis for online single-timescale actor-critic method. Overall, our results compare favorably to the existing literature on analyzing actor-critic in terms of considering the most practical settings and requiring weaker assumptions.
READ FULL TEXT