CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

by   Wei Zhang, et al.

Traditional comparative learning sentence embedding directly uses the encoder to extract sentence features, and then passes in the comparative loss function for learning. However, this method pays too much attention to the sentence body and ignores the influence of some words in the sentence on the sentence semantics. To this end, we propose CMLM-CSE, an unsupervised contrastive learning framework based on conditional MLM. On the basis of traditional contrastive learning, an additional auxiliary network is added to integrate sentence embedding to perform MLM tasks, forcing sentence embedding to learn more masked word information. Finally, when Bertbase was used as the pretraining language model, we exceeded SimCSE by 0.55 percentage points on average in textual similarity tasks, and when Robertabase was used as the pretraining language model, we exceeded SimCSE by 0.3 percentage points on average in textual similarity tasks.


page 1

page 2

page 3

page 4


Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Contrastive learning-based methods, such as unsup-SimCSE, have achieved ...

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning

Recently, finetuning a pretrained language model to capture the similari...

MED-SE: Medical Entity Definition-based Sentence Embedding

We propose Medical Entity Definition-based Sentence Embedding (MED-SE), ...

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Contrastive learning has shown great potential in unsupervised sentence ...

Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations

We examine the effects of contrastive visual semantic pretraining by com...

DefSent: Sentence Embeddings using Definition Sentences

Sentence embedding methods using natural language inference (NLI) datase...

Composition-contrastive Learning for Sentence Embeddings

Vector representations of natural language are ubiquitous in search appl...

Please sign up or login with your details

Forgot password? Click here to reset