Does Standard Backpropagation Forget Less Catastrophically Than Adam?

by   Dylan R. Ashley, et al.

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. We use various testbeds from the reinforcement learning and supervised learning literature to (1) provide evidence that the choice of which modern gradient-based optimization algorithm is used to train an ANN has a significant impact on the amount of catastrophic forgetting and show that–surprisingly–in many instances classical algorithms such as vanilla SGD experience less catastrophic forgetting than the more modern algorithms such as Adam. We empirically compare four different existing metrics for quantifying catastrophic forgetting and (2) show that the degree to which the learning systems experience catastrophic forgetting is sufficiently sensitive to the metric used that a change from one principled metric to another is enough to change the conclusions of a study dramatically. Our results suggest that a much more rigorous experimental methodology is required when looking at catastrophic forgetting. Based on our results, we recommend inter-task forgetting in supervised learning must be measured with both retention and relearning metrics concurrently, and intra-task forgetting in reinforcement learning must–at the very least–be measured with pairwise interference.


Catastrophic Importance of Catastrophic Forgetting

This paper describes some of the possibilities of artificial neural netw...

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Catastrophic forgetting is a problem faced by many machine learning mode...

On Training Recurrent Neural Networks for Lifelong Learning

Capacity saturation and catastrophic forgetting are the central challeng...

Reducing catastrophic forgetting when evolving neural networks

A key stepping stone in the development of an artificial general intelli...

Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively

In lifelong learning systems, especially those based on artificial neura...

Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks

A long-term goal of AI is to produce agents that can learn a diversity o...

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

Continual learning - learning new tasks in sequence while maintaining pe...

Please sign up or login with your details

Forgot password? Click here to reset