Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning
In many real-world settings, a team of cooperative agents must learn to coordinate their behavior with private observations and communication constraints. Deep multiagent reinforcement learning algorithms (Deep-MARL) have shown superior performance in these realistic and difficult problems but still suffer from challenges. One branch is the multiagent value decomposition, which decomposes the global shared multiagent Q-value Q_tot into individual Q-values Q^i to guide individuals' behaviors. However, previous work achieves the value decomposition heuristically without valid theoretical groundings, where VDN supposes an additive formation and QMIX adopts an implicit inexplicable mixing method. In this paper, for the first time, we theoretically derive a linear decomposing formation from Q_tot to each Q^i. Based on this theoretical finding, we introduce the multi-head attention mechanism to approximate each term in the decomposing formula with theoretical explanations. Experiments show that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is also investigated with sights.
READ FULL TEXT