Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

05/26/2023
by   Fabian David Schmidt, et al.
0

Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently evaluated at model checkpoints that generalize best to validation instances in the target languages. This effectively violates the main assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods that do not depend on labeled target language data for validation and model selection. In this work, aiming to improve the robustness of "true" ZS-XLT and FS-XLT, we propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning. We conduct exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks (NLI, extractive QA) and lower-level token classification tasks (NER, POS). The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks. Importantly, it simultaneously substantially desensitizes XLT to varying hyperparameter choices in the absence of target language validation. We also show that checkpoint averaging benefits performance when further combined with run averaging (i.e., averaging the parameters of models fine-tuned over independent runs).

READ FULL TEXT

page 7

page 16

research
10/13/2020

Model Selection for Cross-Lingual Transfer using a Learned Scoring Function

Transformers that are pre-trained on multilingual text corpora, such as,...
research
04/30/2020

On the Evaluation of Contextual Embeddings for Zero-Shot Cross-Lingual Transfer Learning

Pre-trained multilingual contextual embeddings have demonstrated state-o...
research
05/27/2023

Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution

Zero-shot cross-lingual transfer is when a multilingual model is trained...
research
05/25/2022

Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation

In this paper, we explore the challenging problem of performing a genera...
research
04/29/2022

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

The current state-of-the-art for few-shot cross-lingual transfer learnin...
research
03/30/2023

Fine-Tuning BERT with Character-Level Noise for Zero-Shot Transfer to Dialects and Closely-Related Languages

In this work, we induce character-level noise in various forms when fine...
research
07/31/2023

Classifying multilingual party manifestos: Domain transfer across country, time, and genre

Annotating costs of large corpora are still one of the main bottlenecks ...

Please sign up or login with your details

Forgot password? Click here to reset