Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

by   Sebastian Lee, et al.

Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.


page 1

page 2

page 3

page 4


Continual Learning in the Teacher-Student Setup: Impact of Task Similarity

Continual learning-the ability to learn many tasks in sequence-is critic...

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Continual/lifelong learning from a non-stationary input data stream is a...

Out-of-distribution forgetting: vulnerability of continual learning to intra-class distribution shift

Continual learning (CL) is an important technique to allow artificial ne...

GRIm-RePR: Prioritising Generating Important Features for Pseudo-Rehearsal

Pseudo-rehearsal allows neural networks to learn a sequence of tasks wit...

Does Standard Backpropagation Forget Less Catastrophically Than Adam?

Catastrophic forgetting remains a severe hindrance to the broad applicat...

The Role Of Biology In Deep Learning

Artificial neural networks took a lot of inspiration from their biologic...

Please sign up or login with your details

Forgot password? Click here to reset