Equivariant Networks for Zero-Shot Coordination

10/21/2022
by   Darius Muglich, et al.
1

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that prevents the agent from learning policies which break symmetries, doing so more effectively than prior methods. Our method also acts as a "coordination-improvement operator" for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

READ FULL TEXT

page 9

page 17

page 18

research
03/06/2020

"Other-Play" for Zero-Shot Coordination

We consider the problem of zero-shot coordination - constructing AI agen...
research
06/11/2021

A New Formalism, Method and Open Issues for Zero-Shot Coordination

In many coordination problems, independently reasoning humans are able t...
research
01/16/2023

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Zero-shot human-AI coordination holds the promise of collaborating with ...
research
07/14/2022

K-level Reasoning for Zero-Shot Coordination in Hanabi

The standard problem setting in cooperative multi-agent settings is self...
research
03/06/2021

Off-Belief Learning

The standard problem setting in Dec-POMDPs is self-play, where the goal ...
research
01/29/2022

Learning to Coordinate with Humans using Action Features

An unaddressed challenge in human-AI coordination is to enable AI agents...
research
01/28/2022

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Cooperative artificial intelligence with human or superhuman proficiency...

Please sign up or login with your details

Forgot password? Click here to reset