On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

by   Runyu Zhang, et al.

Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making NE-seeking difficult for gradient-based algorithms. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without log-barrier regularization. Establishing convergence for the unregularized cases relies on an assumption that the stationary policies are isolated, and yields convergence bounds that contain a trajectory dependent constant that can be arbitrarily large. We introduce the log-barrier regularization to overcome these drawbacks, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.


page 1

page 2

page 3

page 4


Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

We study the performance of policy gradient methods for the subclass of ...

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

A major challenge in multi-agent systems is that the system complexity g...

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Potential games are arguably one of the most important and widely studie...

Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence

We study the performance of the gradient play algorithm for multi-agent ...

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Executing actions in a correlated manner is a common strategy for human ...

Independent Natural Policy Gradient Always Converges in Markov Potential Games

Multi-agent reinforcement learning has been successfully applied to full...

Please sign up or login with your details

Forgot password? Click here to reset