Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

by   Tiantian Zhang, et al.
Tsinghua University

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance as later training is likely to overwrite and interfer with previously learned policies. In this paper, we introduce the concept of "context" into single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.


page 1

page 6

page 7

page 10

page 12

page 13

page 15

page 17


Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

A key challenge of continual reinforcement learning (CRL) in dynamic env...

Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning

Off-policy ensemble reinforcement learning (RL) methods have demonstrate...

Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps

Using neural networks in the reinforcement learning (RL) framework has a...

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Recent advances in off-policy deep reinforcement learning (RL) have led ...

Same State, Different Task: Continual Reinforcement Learning without Interference

Continual Learning (CL) considers the problem of training an agent seque...

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Reinforcement Learning (RL) has recently achieved remarkable success in ...

Lifelong Reinforcement Learning with Modulating Masks

Lifelong learning aims to create AI systems that continuously and increm...

Please sign up or login with your details

Forgot password? Click here to reset