Deep Reinforcement and InfoMax Learning

06/12/2020
by   Bogdan Mazoure, et al.
0

Our work is based on the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We provide an intuitive analysis of the convergence properties of our approach from the perspective of Markov chain mixing times and argue that convergence of the lower bound on mutual information is related to the inverse absolute spectral gap of the transition model. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.

READ FULL TEXT
research
01/16/2020

MIME: Mutual Information Minimisation Exploration

We show that reinforcement learning agents that learn by surprise (surpr...
research
07/24/2020

Predictive Information Accelerates Learning in RL

The Predictive Information is the mutual information between the past an...
research
07/12/2020

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

While deep reinforcement learning excels at solving tasks where large am...
research
12/13/2021

Continual Learning In Environments With Polynomial Mixing Times

The mixing time of the Markov chain induced by a policy limits performan...
research
04/30/2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Learning a good representation is an essential component for deep reinfo...
research
04/11/2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown ...
research
02/09/2018

Optimized Bacteria are Environmental Prediction Engines

Experimentalists have observed phenotypic variability in isogenic bacter...

Please sign up or login with your details

Forgot password? Click here to reset