Towards a Standardised Performance Evaluation Protocol for Cooperative MARL

by   Rihab Gorsane, et al.

Multi-agent reinforcement learning (MARL) has emerged as a useful approach to solving decentralised decision-making problems at scale. Research in the field has been growing steadily with many breakthrough algorithms proposed in recent years. In this work, we take a closer look at this rapid development with a focus on evaluation methodologies employed across a large body of research in cooperative MARL. By conducting a detailed meta-analysis of prior work, spanning 75 papers accepted for publication from 2016 to 2022, we bring to light worrying trends that put into question the true rate of progress. We further consider these trends in a wider context and take inspiration from single-agent RL literature on similar issues with recommendations that remain applicable to MARL. Combining these recommendations, with novel insights from our analysis, we propose a standardised performance evaluation protocol for cooperative MARL. We argue that such a standard protocol, if widely adopted, would greatly improve the validity and credibility of future research, make replication and reproducibility easier, as well as improve the ability of the field to accurately gauge the rate of progress over time by being able to make sound comparisons across different works. Finally, we release our meta-analysis data publicly on our project website for future research on evaluation:


The StarCraft Multi-Agent Challenge

In the last few years, deep multi-agent reinforcement learning (RL) has ...

RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

In recent years, Multi-Agent Reinforcement Learning (MARL) has revolutio...

A Review of Cooperative Multi-Agent Deep Reinforcement Learning

Deep Reinforcement Learning has made significant progress in multi-agent...

SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

The availability of challenging benchmarks has played a key role in the ...

How I failed machine learning in medical imaging – shortcomings and recommendations

Medical imaging is an important research field with many opportunities f...

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Recent years have witnessed significant advances in reinforcement learni...

Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation

Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques th...

Please sign up or login with your details

Forgot password? Click here to reset