The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

by   Maximilian Igl, et al.

Non-stationarity arises in Reinforcement Learning (RL) even in stationary environments. Most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Furthermore, training targets in RL can change even with a fixed state distribution when the policy, critic, or bootstrap values are updated. We study these types of non-stationarity in supervised learning settings as well as in RL, finding that they can lead to worse generalisation performance when using deep neural network function approximators. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.


Robust Deep Reinforcement Learning for Quadcopter Control

Deep reinforcement learning (RL) has made it possible to solve complex r...

Deep Reinforcement Learning amidst Lifelong Non-Stationarity

As humans, our goals and our environment are persistently changing throu...

Entropy Regularized Reinforcement Learning with Cascading Networks

Deep Reinforcement Learning (Deep RL) has had incredible achievements on...

Understanding and Preventing Capacity Loss in Reinforcement Learning

The reinforcement learning (RL) problem is rife with sources of non-stat...

An Intrusion Response System utilizing Deep Q-Networks and System Partitions

Intrusion Response is a relatively new field of research. Recent approac...

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset