The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

05/23/2022
by   Haoran Xu, et al.
0

Recent model pruning methods have demonstrated the ability to remove redundant parameters without sacrificing model performance. Common methods remove redundant parameters according to the parameter sensitivity, a gradient-based measure reflecting the contribution of the parameters. In this paper, however, we argue that redundant parameters can be trained to make beneficial contributions. We first highlight the large sensitivity (contribution) gap among high-sensitivity and low-sensitivity parameters and show that the model generalization performance can be significantly improved after balancing the contribution of all parameters. Our goal is to balance the sensitivity of all parameters and encourage all of them to contribute equally. We propose a general task-agnostic method, namely intra-distillation, appended to the regular training loss to balance parameter sensitivity. Moreover, we also design a novel adaptive learning method to control the strength of intra-distillation loss for faster convergence. Our experiments show the strong effectiveness of our methods on machine translation, natural language understanding, and zero-shot cross-lingual transfer across up to 48 languages, e.g., a gain of 3.54 BLEU on average across 8 language pairs from the IWSLT'14 translation dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Recent research has shown the existence of significant redundancy in lar...
research
02/10/2023

Language-Aware Multilingual Machine Translation with Self-Supervised Learning

Multilingual machine translation (MMT) benefits from cross-lingual trans...
research
06/01/2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

Research in multilingual speech-to-text translation is topical. Having a...
research
04/15/2021

Zero-Shot Cross-lingual Semantic Parsing

Recent work in crosslingual semantic parsing has successfully applied ma...
research
07/20/2021

More Parameters? No Thanks!

This work studies the long-standing problems of model capacity and negat...
research
10/22/2022

Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU

Curriculum Learning (CL) is a technique of training models via ranking e...

Please sign up or login with your details

Forgot password? Click here to reset