SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

by   Benjamin Ellis, et al.
University of Oxford

The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this work, we conduct new analysis demonstrating that SMAC is not sufficiently stochastic to require complex closed-loop policies. In particular, we show that an open-loop policy conditioned only on the timestep can achieve non-trivial win rates for many SMAC scenarios. To address this limitation, we introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation. We show that these changes ensure the benchmark requires the use of closed-loop policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it presents significant challenges not present in the original benchmark. Our analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can help benchmark the next generation of MARL methods. Videos of training are available at


Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization

This paper presents an extension of the Mirror Descent method to overcom...

The StarCraft Multi-Agent Challenge

In the last few years, deep multi-agent reinforcement learning (RL) has ...

RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

In recent years, Multi-Agent Reinforcement Learning (MARL) has revolutio...

Multi-agent reinforcement learning for intent-based service assurance in cellular networks

Recently, intent-based management is receiving good attention in telecom...

Towards a Standardised Performance Evaluation Protocol for Cooperative MARL

Multi-agent reinforcement learning (MARL) has emerged as a useful approa...

Energy-based Surprise Minimization for Multi-Agent Value Factorization

Multi-Agent Reinforcement Learning (MARL) has demonstrated significant s...

MARBLER: An Open Platform for Standarized Evaluation of Multi-Robot Reinforcement Learning Algorithms

Multi-agent reinforcement learning (MARL) has enjoyed significant recent...

Please sign up or login with your details

Forgot password? Click here to reset