Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games

by   Yan Chen, et al.

We study Nash equilibria learning of a general-sum stochastic game with an unknown transition probability density function. Agents take actions at the current environment state and their joint action influences the transition of the environment state and their immediate rewards. Each agent only observes the environment state and its own immediate reward and is unknown about the actions or immediate rewards of others. We introduce the concepts of weighted asymptotic Nash equilibrium with probability 1 and in probability. For the case with exact pseudo gradients, we design a two-loop algorithm by the equivalence of Nash equilibrium and variational inequality problems. In the outer loop, we sequentially update a constructed strongly monotone variational inequality by updating a proximal parameter while employing a single-call extra-gradient algorithm in the inner loop for solving the constructed variational inequality. We show that if the associated Minty variational inequality has a solution, then the designed algorithm converges to the k^1/2-weighted asymptotic Nash equilibrium. Further, for the case with unknown pseudo gradients, we propose a decentralized algorithm, where the G(PO)MDP gradient estimator of the pseudo gradient is provided by Monte-Carlo simulations. The convergence to the k^1/4 -weighted asymptotic Nash equilibrium in probability is achieved.


page 1

page 2

page 3

page 4


Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

We investigate Nash equilibrium learning in a competitive Markov Game (M...

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

We explore the use of policy approximation for reducing the computationa...

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

We study decentralized learning in two-player zero-sum discounted Markov...

Learning equilibria with personalized incentives in a class of nonmonotone games

We consider quadratic, nonmonotone generalized Nash equilibrium problems...

On the Rate of Convergence of Payoff-based Algorithms to Nash Equilibrium in Strongly Monotone Games

We derive the rate of convergence to Nash equilibria for the payoff-base...

Learning Parametric Closed-Loop Policies for Markov Potential Games

Multiagent systems where the agents interact among themselves and with a...

Asymptotic Behavior of Bayesian Learners with Misspecified Models

We consider an agent who represents uncertainty about her environment vi...

Please sign up or login with your details

Forgot password? Click here to reset