TransAug: Translate as Augmentation for Sentence Embeddings

10/30/2021
by   Jue Wang, et al.
0

While contrastive learning greatly advances the representation of sentence embeddings, it is still limited by the size of the existing sentence datasets. In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings. Instead of adopting an encoder trained in other languages setting, we first distill a Chinese encoder from a SimCSE encoder (pretrained in English), so that their embeddings are close in semantic space, which can be regraded as implicit data augmentation. Then, we only update the English encoder via cross-lingual contrastive learning and frozen the distilled Chinese encoder. Our approach achieves a new state-of-art on standard semantic textual similarity (STS), outperforming both SimCSE and Sentence-T5, and the best performance in corresponding tracks on transfer tasks evaluated by SentEval.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2021

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

We provide the first exploration of text-to-text transformers (T5) sente...
research
04/18/2021

SimCSE: Simple Contrastive Learning of Sentence Embeddings

This paper presents SimCSE, a simple contrastive learning framework that...
research
10/08/2022

SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning

Contrastive learning methods achieve state-of-the-art results in unsuper...
research
11/23/2021

S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding

Contrastive learning has been studied for improving the performance of l...
research
07/20/2023

Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations

The enhancement of unsupervised learning of sentence representations has...
research
09/16/2023

Leveraging Multi-lingual Positive Instances in Contrastive Learning to Improve Sentence Embedding

Learning multi-lingual sentence embeddings is a fundamental and signific...
research
02/26/2022

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning

Recently, finetuning a pretrained language model to capture the similari...

Please sign up or login with your details

Forgot password? Click here to reset