Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

02/18/2023
by   Qihuang Zhong, et al.
15

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate several effective strategies and choose their best combination setting as the training recipes. As for model structure, we employ the vanilla Transformer with disentangled attention as the basic block encoder. For self-supervised training, we employ the representative denoising objective (i.e., replaced token detection) in phase 1 and combine the contrastive objective (i.e., sentence embedding contrastive learning) with it in phase 2. During fine-tuning, several advanced techniques such as transductive fine-tuning, self-calibrated fine-tuning, and adversarial fine-tuning are adopted. [Results] According to our submission record (Jan. 2022), with our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3. Encouragingly, our Vega v1 is the first to exceed powerful human performance on the two challenging tasks, i.e., SST-2 and WNLI. We believe our empirically successful recipe with a bag of tricks could shed new light on developing efficient discriminative large language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2022

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

This technical report briefly describes our JDExplore d-team's Vega v2 s...
research
03/07/2022

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Recently the prompt-tuning paradigm has attracted significant attention....
research
10/24/2018

Universal Language Model Fine-Tuning with Subword Tokenization for Polish

Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is ...
research
08/09/2023

Optimizing a Transformer-based network for a deep learning seismic processing workflow

StorSeismic is a recently introduced model based on the Transformer to a...
research
09/28/2020

Domain Adversarial Fine-Tuning as an Effective Regularizer

In Natural Language Processing (NLP), pre-trained language models (LMs) ...
research
05/30/2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

A number of recent benchmarks seek to assess how well models handle natu...
research
05/03/2023

PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer

Recent studies show that prompt tuning can better leverage the power of ...

Please sign up or login with your details

Forgot password? Click here to reset