SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning

10/08/2022
by   Zhenyu Mao, et al.
0

Contrastive learning methods achieve state-of-the-art results in unsupervised sentence representation learning. Although playing essential roles in contrastive learning, data augmentation methods applied on sentences have not been fully explored. Current SOTA method SimCSE utilizes a simple dropout mechanism as continuous augmentation which outperforms discrete augmentations such as cropping, word deletion and synonym replacement. To understand the underlying rationales, we revisit existing approaches and attempt to hypothesize the desiderata of reasonable data augmentation methods: balance of semantic consistency and expression diversity. Based on the hypothesis, we propose three simple yet effective discrete sentence augmentation methods, i.e., punctuation insertion, affirmative auxiliary and double negation. The punctuation marks, auxiliaries and negative words act as minimal noises in lexical level to produce diverse sentence expressions. Unlike traditional augmentation methods which randomly modify the sentence, our augmentation rules are well designed for generating semantically consistent and grammatically correct sentences. We conduct extensive experiments on both English and Chinese semantic textual similarity datasets. The results show the robustness and effectiveness of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2022

Adversarial Graph Contrastive Learning with Information Regularization

Contrastive learning is an effective unsupervised method in graph repres...
research
01/16/2022

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Unsupervised sentence embedding aims to obtain the most appropriate embe...
research
10/30/2021

TransAug: Translate as Augmentation for Sentence Embeddings

While contrastive learning greatly advances the representation of senten...
research
10/16/2021

Virtual Augmentation Supported Contrastive Learning of Sentence Representations

Despite profound successes, contrastive representation learning relies o...
research
05/22/2023

ImSimCSE: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives

This paper aims to improve contrastive learning for sentence embeddings ...
research
11/23/2021

S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding

Contrastive learning has been studied for improving the performance of l...
research
12/09/2022

MED-SE: Medical Entity Definition-based Sentence Embedding

We propose Medical Entity Definition-based Sentence Embedding (MED-SE), ...

Please sign up or login with your details

Forgot password? Click here to reset