Distilling the Knowledge of Romanian BERTs Using Multiple Teachers

12/23/2021
by   Andrei-Marius Avram, et al.
0

Running large-scale pre-trained language models in computationally constrained environments remains a challenging problem yet to be addressed, while transfer learning from these models has become prevalent in Natural Language Processing tasks. Several solutions, including knowledge distillation, network quantization, or network pruning have been previously proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base, and DistilMulti-BERT-base-ro. The first two models resulted from the individual distillation of knowledge from two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity, and dialect identification. Our experimental results argue that the three distilled models maintain most performance in terms of accuracy with their teachers, while being twice as fast on a GPU and  35 addition, we further test the similarity between the predictions of our students versus their teachers by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2021

Prune Once for All: Sparse Pre-Trained Language Models

Transformer-based language models are applied to a wide range of applica...
research
09/17/2021

New Students on Sesame Street: What Order-Aware Matrix Embeddings Can Learn from BERT

Large-scale pretrained language models (PreLMs) are revolutionizing natu...
research
03/29/2021

Contextual Text Embeddings for Twi

Transformer-based language models have been changing the modern Natural ...
research
12/15/2019

Multilingual is not enough: BERT for Finnish

Deep learning-based language models pretrained on large unannotated text...
research
12/01/2021

Wiki to Automotive: Understanding the Distribution Shift and its impact on Named Entity Recognition

While transfer learning has become a ubiquitous technique used across Na...
research
04/19/2022

ALBETO and DistilBETO: Lightweight Spanish Language Models

In recent years there have been considerable advances in pre-trained lan...
research
03/19/2021

Cost-effective Deployment of BERT Models in Serverless Environment

In this study we demonstrate the viability of deploying BERT-style model...

Please sign up or login with your details

Forgot password? Click here to reset