Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning

by   Matthew Riemer, et al.

Although neural networks are well suited for sequential transfer learning tasks, the catastrophic forgetting problem hinders proper integration of prior knowledge. In this work, we propose a solution to this problem by using a multi-task objective based on the idea of distillation and a mechanism that directly penalizes forgetting at the shared representation layer during the knowledge integration phase of training. We demonstrate our approach on a Twitter domain sentiment analysis task with sequential knowledge transfer from four related tasks. We show that our technique outperforms networks fine-tuned to the target task. Additionally, we show both through empirical evidence and examples that it does not forget useful knowledge from the source task that is forgotten during standard fine-tuning. Surprisingly, we find that first distilling a human made rule based sentiment engine into a recurrent neural network and then integrating the knowledge with the target task data leads to a substantial gain in generalization performance. Our experiments demonstrate the power of multi-source transfer techniques in practical text analytics problems when paired with distillation. In particular, for the SemEval 2016 Task 4 Subtask A (Nakov et al., 2016) dataset we surpass the state of the art established during the competition with a comparatively simple model architecture that is not even competitive when trained on only the labeled task specific data.


page 1

page 2

page 3

page 4


Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement

Fine-tuning deep neural networks pre-trained on large scale datasets is ...

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Current approaches to solving classification tasks in NLP involve fine-t...

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

We compare sequential fine-tuning with a model for multi-task learning i...

Using Transfer Learning for Code-Related Tasks

Deep learning (DL) techniques have been used to support several code-rel...

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

Prompt-tuning, which freezes pretrained language models (PLMs) and only ...

Pseudo-task Regularization for ConvNet Transfer Learning

This paper is about regularizing deep convolutional networks (ConvNets) ...

Efficient Neural Task Adaptation by Maximum Entropy Initialization

Transferring knowledge from one neural network to another has been shown...

Please sign up or login with your details

Forgot password? Click here to reset