Efficient pre-training objectives for Transformers

04/20/2021
by   Luca Di Liello, et al.
0

The Transformer architecture deeply changed the natural language processing, outperforming all previous state-of-the-art models. However, well-known Transformer models like BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representation. In this paper, we study several efficient pre-training objectives for Transformers-based models. By testing these objectives on different tasks, we determine which of the ELECTRA model's new features is the most relevant. We confirm that Transformers pre-training is improved when the input does not contain masked tokens and that the usage of the whole output to compute the loss reduces training time. Moreover, inspired by ELECTRA, we study a model composed of two blocks; a discriminator and a simple generator based on a statistical model with no impact on the computational performances. Besides, we prove that eliminating the MASK token and considering the whole output during the loss computation are essential choices to improve performance. Furthermore, we show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator, which is expensive. Finally, we show that ELECTRA benefits heavily from a state-of-the-art hyper-parameters search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Effective Pre-Training Objectives for Transformer-based Autoencoders

In this paper, we study trade-offs between efficiency, cost and accuracy...
research
09/15/2023

Structural Self-Supervised Objectives for Transformers

This thesis focuses on improving the pre-training of natural language mo...
research
03/23/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt...
research
11/29/2021

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

We present Point-BERT, a new paradigm for learning Transformers to gener...
research
11/05/2020

Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction

Machine Learning (ML) for information security (InfoSec) utilizes distin...
research
06/28/2021

Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction

The needs for precisely estimating a student's academic performance have...
research
10/23/2022

Transformers For Recognition In Overhead Imagery: A Reality Check

There is evidence that transformers offer state-of-the-art recognition p...

Please sign up or login with your details

Forgot password? Click here to reset