CPM: A Large-scale Generative Chinese Pre-trained Language Model

12/01/2020
by   Zhengyan Zhang, et al.
4

Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB training data, drew a lot of attention due to the capacity of few-shot (even zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning. The code and parameters are available at https://github.com/TsinghuaAI/CPM-Generate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained models have achieved state-of-the-art results in various Nat...
research
02/21/2022

StyleBERT: Chinese pretraining by font style information

With the success of down streaming task using English pre-trained langua...
research
07/17/2022

ELECTRA is a Zero-Shot Learner, Too

Recently, for few-shot or even zero-shot learning, the new paradigm "pre...
research
11/29/2020

Intrinsic Knowledge Evaluation on Chinese Language Models

Recent NLP tasks have benefited a lot from pre-trained language models (...
research
02/18/2023

BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark

To advance Chinese financial natural language processing (NLP), we intro...
research
09/21/2022

WeLM: A Well-Read Pre-trained Language Model for Chinese

Large Language Models pre-trained with self-supervised learning have dem...
research
09/12/2022

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot ...

Please sign up or login with your details

Forgot password? Click here to reset