Explore and Control with Adversarial Surprise

by   Arnaud Fickinger, et al.

Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. However, since designing rewards often requires substantial engineering effort, we are interested in the problem of learning without rewards, where agents must discover useful behaviors in the absence of task-specific incentives. Intrinsic motivation is a family of unsupervised RL techniques which develop general objectives for an RL agent to optimize that lead to better exploration or the discovery of skills. In this paper, we propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The policies each take turns controlling the agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. The game harnesses the power of multi-agent competition to drive the agent to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show empirically that our method leads to the emergence of complex skills by exhibiting clear phase transitions. Furthermore, we show both theoretically (via a latent state space coverage argument) and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments. We show that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.


page 2

page 10

page 17


AutoDIME: Automatic Design of Interesting Multi-Agent Environments

Designing a distribution of environments in which RL agents can learn in...

IMAP: Intrinsically Motivated Adversarial Policy

Reinforcement learning (RL) agents are known to be vulnerable to evasion...

Information is Power: Intrinsic Control via Information Capture

Humans and animals explore their environment and acquire useful skills e...

SMiRL: Surprise Minimizing RL in Dynamic Environments

All living organisms struggle against the forces of nature to carve out ...

Emergent Behaviors in Multi-Agent Target Acquisition

Only limited studies and superficial evaluations are available on agents...

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

We present an approach for autonomous sensor control for information gat...

A Regularized Opponent Model with Maximum Entropy Objective

In a single-agent setting, reinforcement learning (RL) tasks can be cast...

Please sign up or login with your details

Forgot password? Click here to reset