Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

by   Weichao Mao, et al.

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an ϵ-approximate CCE in at most O( H^6S A /ϵ^2) episodes, where S is the number of states, A is the size of the largest individual action space, and H is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is fully decentralized, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.


page 1

page 2

page 3

page 4


Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

We study decentralized policy learning in Markov games where we control ...

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

We consider the problem of decentralized multi-agent reinforcement learn...

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

An abundance of recent impossibility results establish that regret minim...

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

We consider model-based multi-agent reinforcement learning, where the en...

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) algorithms often suffer from a...

Decentralized Q-Learning in Zero-sum Markov Games

We study multi-agent reinforcement learning (MARL) in infinite-horizon d...

Please sign up or login with your details

Forgot password? Click here to reset