Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

by   Qihuang Zhong, et al.

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.


page 1

page 2

page 3

page 4


Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

This technical report briefly describes our JDExplore d-team's submissio...

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Charts are very popular for analyzing data, visualizing key insights and...

Deep Bidirectional Language-Knowledge Graph Pretraining

Pretraining a language model (LM) on text has been shown to help various...

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...

Self-Evolution Learning for Discriminative Language Model Pretraining

Masked language modeling, widely used in discriminative language model (...

Privacy-Adaptive BERT for Natural Language Understanding

When trying to apply the recent advance of Natural Language Understandin...

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Accurate lexical entailment (LE) and natural language inference (NLI) of...

Please sign up or login with your details

Forgot password? Click here to reset