Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

by   Dingyang Chen, et al.

Executing actions in a correlated manner is a common strategy for human coordination that often leads to better cooperation, which is also potentially beneficial for cooperative multi-agent reinforcement learning (MARL). However, the recent success of MARL relies heavily on the convenient paradigm of purely decentralized execution, where there is no action correlation among agents for scalability considerations. In this work, we introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy. Theoretically, we establish a theoretical justification for why action dependencies are beneficial by deriving the multi-agent policy gradient formula under such a Bayesian network joint policy and proving its global convergence to Nash equilibria under tabular softmax policy parameterization in cooperative Markov games. Further, by equipping existing MARL algorithms with a recent method of differentiable directed acyclic graphs (DAGs), we develop practical algorithms to learn the context-aware Bayesian network policies in scenarios with partial observability and various difficulty. We also dynamically decrease the sparsity of the learned DAG throughout the training process, which leads to weakly or even purely independent policies for decentralized execution. Empirical results on a range of MARL benchmarks show the benefits of our approach.


page 1

page 2

page 3

page 4


Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

We explore deep reinforcement learning methods for multi-agent domains. ...

Towards Global Optimality in Cooperative MARL with Sequential Transformation

Policy learning in multi-agent reinforcement learning (MARL) is challeng...

Multi-Agent Actor-Critic with Generative Cooperative Policy Network

We propose an efficient multi-agent reinforcement learning approach to d...

Planning Not to Talk: Multiagent Systems that are Robust to Communication Loss

In a cooperative multiagent system, a collection of agents executes a jo...

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperati...

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Softmax policy gradient is a popular algorithm for policy optimization i...

Solving Common-Payoff Games with Approximate Policy Iteration

For artificially intelligent learning systems to have widespread applica...

Please sign up or login with your details

Forgot password? Click here to reset