Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

by   Hanhan Zhou, et al.

Value function factorization via centralized training and decentralized execution is promising for solving cooperative multi-agent reinforcement tasks. One of the approaches in this area, QMIX, has become state-of-the-art and achieved the best performance on the StarCraft II micromanagement benchmark. However, the monotonic-mixing of per agent estimates in QMIX is known to restrict the joint action Q-values it can represent, as well as the insufficient global state information for single agent value function estimation, often resulting in suboptimality. To this end, we present LSF-SAC, a novel framework that features a variational inference-based information-sharing mechanism as extra state information to assist individual agents in the value function factorization. We demonstrate that such latent individual state information sharing can significantly expand the power of value function factorization, while fully decentralized execution can still be maintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks. We further set extensive ablation studies for locating the key factors accounting for its performance improvements. We believe that this new insight can lead to new local value estimation methods and variational deep learning algorithms. A demo video and code of implementation can be found at


NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

Multi-agent value-based approaches recently make great progress, especia...

Learning Nearly Decomposable Value Functions Via Communication Minimization

Reinforcement learning encounters major challenges in multi-agent settin...

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Collaborative multi-agent reinforcement learning (MARL) has been widely ...

ReMIX: Regret Minimization for Monotonic Value Function Factorization in Multiagent Reinforcement Learning

Value function factorization methods have become a dominant approach for...

Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation

We study the problem of multi-robot mapless navigation in the popular Ce...

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate its behavi...

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate their beha...

Please sign up or login with your details

Forgot password? Click here to reset