RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

02/16/2021
by   Wei Qiu, et al.
0

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents in complex environments. To address these issues, we propose RMIX, a novel cooperative MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaR for decentralized execution. Then, to handle the temporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictor for risk level tuning. Finally, we optimize the CVaR policies with CVaR values used to estimate the target in TD error during centralized training and the CVaR values are used as auxiliary local rewards to update the local distribution via Quantile Regression loss. Empirically, we show that our method significantly outperforms state-of-the-art methods on challenging StarCraft II tasks, demonstrating enhanced coordination and improved sample efficiency.

READ FULL TEXT

page 6

page 8

research
05/31/2022

Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games

While various multi-agent reinforcement learning methods have been propo...
research
06/22/2021

MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning

In the real world, many tasks require multiple agents to cooperate with ...
research
03/14/2023

Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

In general-sum games, the interaction of self-interested learning agents...
research
09/09/2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the s...
research
02/21/2022

DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning

In cooperative multi-agent tasks, a team of agents jointly interact with...
research
02/10/2020

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Recently, deep multiagent reinforcement learning (MARL) has become a hig...
research
03/16/2022

CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

Due to the partial observability and communication constraints in many m...

Please sign up or login with your details

Forgot password? Click here to reset