Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic

06/12/2022
by   Qijun Luo, et al.
2

Decentralized Actor-Critic (AC) algorithms have been widely utilized for multi-agent reinforcement learning (MARL) and have achieved remarkable success. Apart from its empirical success, the theoretical convergence property of decentralized AC algorithms is largely unexplored. The existing finite-time convergence results are derived based on either double-loop update or two-timescale step sizes rule, which is not often adopted in real implementation. In this work, we introduce a fully decentralized AC algorithm, where actor, critic, and global reward estimator are updated in an alternating manner with step sizes being of the same order, namely, we adopt the single-timescale update. Theoretically, using linear approximation for value and reward estimation, we show that our algorithm has sample complexity of 𝒪̃(ϵ^-2) under Markovian sampling, which matches the optimal complexity with double-loop implementation (here, 𝒪̃ hides a log term). The sample complexity can be improved to 𝒪(ϵ^-2) under the i.i.d. sampling scheme. The central to establishing our complexity results is the hidden smoothness of the optimal critic variable we revealed. We also provide a local action privacy-preserving version of our algorithm and its analysis. Finally, we conduct experiments to show the superiority of our algorithm over the existing decentralized AC algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset