A Regularized Opponent Model with Maximum Entropy Objective

05/17/2019
by   Zheng Tian, et al.
2

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

In this paper, we propose a new mutual information framework for multi-a...
research
06/04/2020

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

In this paper, we propose a maximum mutual information (MMI) framework f...
research
03/22/2021

Regularized Softmax Deep Multi-Agent Q-Learning

Tackling overestimation in Q-learning is an important problem that has b...
research
01/29/2019

Multi Agent Reinforcement Learning with Multi-Step Generative Models

The dynamics between agents and the environment are an important compone...
research
01/07/2021

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

In this work, we consider the problem of computing optimal actions for R...
research
07/12/2021

Explore and Control with Adversarial Surprise

Reinforcement learning (RL) provides a framework for learning goal-direc...
research
03/08/2019

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

We introduce a multi-agent meta-modeling game to generate data, knowledg...

Please sign up or login with your details

Forgot password? Click here to reset