Discovering Causality for Efficient Cooperation in Multi-Agent Environments

by   Rafael Pina, et al.

In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caused the rewards. As a consequence, we observe that the emergence of cooperative behaviours is not necessarily a byproduct of being able to solve a task as a team. In this paper, we investigate the applications of causality in MARL and how it can be applied in MARL to penalise these lazy agents. We observe that causality estimations can be used to improve the credit assignment to the agents and show how it can be leveraged to improve independent learning in MARL. Furthermore, we investigate how Amortized Causal Discovery can be used to automate causality detection within MARL environments. The results demonstrate that causality relations between individual observations and the team reward can be used to detect and punish lazy agents, making them develop more intelligent behaviours. This results in improvements not only in the overall performances of the team but also in their individual capabilities. In addition, results show that Amortized Causal Discovery can be used efficiently to find causal relations in MARL.


page 5

page 8

page 13


Causality Detection for Efficient Multi-Agent Reinforcement Learning

When learning a task as a team, some agents in Multi-Agent Reinforcement...

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

In cooperative multi-agent reinforcement learning (c-MARL), agents learn...

Two-stage training algorithm for AI robot soccer

In multi-agent reinforcement learning, the cooperative learning behavior...

Causality, Responsibility and Blame in Team Plans

Many objectives can be achieved (or may be achieved more effectively) on...

A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems

Multi-Agent Systems (MASs) have been used to solve complex problems that...

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

In cooperative multi-agent systems, agents jointly take actions and rece...

Causal Multi-Agent Reinforcement Learning: Review and Open Problems

This paper serves to introduce the reader to the field of multi-agent re...

Please sign up or login with your details

Forgot password? Click here to reset