Composable Sparse Fine-Tuning for Cross-Lingual Transfer

by   Alan Ansell, et al.

Fine-tuning all parameters of a pre-trained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pre-trained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at


page 1

page 3

page 4

page 5

page 6

page 9

page 10

page 11


Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Recently, fine-tuning pre-trained cross-lingual models (e.g., multilingu...

Soft Layer Selection with Meta-Learning for Zero-Shot Cross-Lingual Transfer

Multilingual pre-trained contextual embedding models (Devlin et al., 201...

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

The cross-lingual transfer is a promising technique to solve tasks in le...

Consistency Regularization for Cross-Lingual Fine-Tuning

Fine-tuning pre-trained cross-lingual language models can transfer task-...

Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast

Existing research has shown that a multilingual pre-trained language mod...

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Adapter Tuning, which freezes the pretrained language models (PLMs) and ...

Please sign up or login with your details

Forgot password? Click here to reset