Revisiting Pre-Trained Models for Chinese Natural Language Processing

04/29/2020
by   Yiming Cui, et al.
0

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and various variants have been proposed to further improve the performance of the pre-trained models. In this paper, we target on revisiting Chinese pre-trained models to examine their effectiveness in a non-English language and release the Chinese pre-trained model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy. We carried out extensive experiments on various Chinese NLP tasks, covering sentence-level to document-level, to revisit the existing pre-trained models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

Pre-Training with Whole Word Masking for Chinese BERT

Bidirectional Encoder Representations from Transformers (BERT) has shown...
research
04/03/2023

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model

In natural language processing, pre-trained language models have become ...
research
04/09/2021

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, couple...
research
09/11/2020

A Comparison of LSTM and BERT for Small Corpus

Recent advancements in the NLP field showed that transfer learning helps...
research
05/02/2021

MathBERT: A Pre-Trained Model for Mathematical Formula Understanding

Large-scale pre-trained models like BERT, have obtained a great success ...
research
10/03/2021

Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Over the last few years, Contextualized Pre-trained Neural Language Mode...
research
05/23/2023

Eliminating Spurious Correlations from Pre-trained Models via Data Mixing

Machine learning models pre-trained on large datasets have achieved rema...

Please sign up or login with your details

Forgot password? Click here to reset