Weighted Sampling for Masked Language Modeling

by   Linhan Zhang, et al.
Alibaba Group
The Hong Kong University of Science and Technology

Masked Language Modeling (MLM) is widely used to pretrain language models. The standard random masking strategy in MLM causes the pre-trained language models (PLMs) to be biased toward high-frequency tokens. Representation learning of rare tokens is poor and PLMs have limited performance on downstream tasks. To alleviate this frequency bias issue, we propose two simple and effective Weighted Sampling strategies for masking tokens based on the token frequency and training loss. We apply these two strategies to BERT and obtain Weighted-Sampled BERT (WSBERT). Experiments on the Semantic Textual Similarity benchmark (STS) show that WSBERT significantly improves sentence embeddings over BERT. Combining WSBERT with calibration methods and prompt learning further improves sentence embeddings. We also investigate fine-tuning WSBERT on the GLUE benchmark and show that Weighted Sampling also improves the transfer learning capability of the backbone PLM. We further analyze and provide insights into how WSBERT improves token embeddings.


page 1

page 2

page 3

page 4


Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computati...

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Token dropping is a recently-proposed strategy to speed up the pretraini...

A Frustratingly Easy Improvement for Position Embeddings via Random Padding

Position embeddings, encoding the positional relationships among tokens ...

Position Masking for Language Models

Masked language modeling (MLM) pre-training models such as BERT corrupt ...

AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various do...

PMI-Masking: Principled masking of correlated spans

Masking tokens uniformly at random constitutes a common flaw in the pret...

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition

A speech-based hearing test is designed to identify the susceptible erro...

Please sign up or login with your details

Forgot password? Click here to reset