reStructured Pre-training

06/22/2022
by   Weizhe Yuan, et al.
0

In this work, we try to decipher the internal connection of NLP technology development in the past decades, searching for essence, which rewards us with a (potential) new learning paradigm for NLP tasks, dubbed as reStructured Pre-training (RST). In such a paradigm, the role of data will be re-emphasized, and model pre-training and fine-tuning of downstream tasks are viewed as a process of data storing and accessing. Based on that, we operationalize the simple principle that a good storage mechanism should not only have the ability to cache a large amount of data but also consider the ease of access. We achieve this by pre-training models over restructured data that consist of a variety of valuable information instead of raw data after overcoming several engineering challenges. Experimentally, RST models not only surpass strong competitors (e.g., T0) on 52/55 popular datasets from a variety of NLP tasks, but also achieve superior performance in National College Entrance Examination - English (Gaokao-English),the most authoritative examination in China. Specifically, the proposed system Qin achieves 40 points higher than the average scores made by students and 15 points higher than GPT3 with 1/16 parameters. In particular, Qin gets a high score of 138.5 (the full mark is 150) in the 2018 English exam (national paper III). We have released the Gaokao Benchmark with an online submission platform. In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022.06.08), and it gets a total score of 134 (v.s. GPT3's 108).

READ FULL TEXT

page 4

page 16

page 20

page 26

page 28

page 31

page 33

page 39

research
08/15/2019

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable resul...
research
12/31/2020

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Deep, heavily overparameterized language models such as BERT, XLNet and ...
research
09/15/2021

Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative

Pre-training, where models are trained on an auxiliary objective with ab...
research
04/14/2021

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

A possible explanation for the impressive performance of masked language...
research
04/30/2020

Mind Your Inflections! Improving NLP for Non-Standard English with Base-Inflection Encoding

Morphological inflection is a process of word formation where base words...
research
05/22/2023

Rethinking Semi-supervised Learning with Language Models

Semi-supervised learning (SSL) is a popular setting aiming to effectivel...

Please sign up or login with your details

Forgot password? Click here to reset