Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

08/19/2021
by   Jianmo Ni, et al.
4

We provide the first exploration of text-to-text transformers (T5) sentence embeddings. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity (STS). Our encoder-decoder method achieves further improvement on STS. Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks. Finally, we introduce a two-stage contrastive learning approach that achieves a new state-of-art on STS using sentence embeddings, outperforming both Sentence BERT and SimCSE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2021

TransAug: Translate as Augmentation for Sentence Embeddings

While contrastive learning greatly advances the representation of senten...
research
03/11/2021

FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

Pretrained text encoders, such as BERT, have been applied increasingly i...
research
08/15/2019

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant ...
research
09/21/2021

InvBERT: Text Reconstruction from Contextualized Embeddings used for Derived Text Formats of Literary Works

Digital Humanities and Computational Literary Studies apply text mining ...
research
10/28/2017

Exploring Asymmetric Encoder-Decoder Structure for Context-based Sentence Representation Learning

Context information plays an important role in human language understand...
research
01/17/2023

Learning a Formality-Aware Japanese Sentence Representation

While the way intermediate representations are generated in encoder-deco...
research
12/31/2019

Revisiting Paraphrase Question Generator using Pairwise Discriminator

In this paper, we propose a method for obtaining sentence-level embeddin...

Please sign up or login with your details

Forgot password? Click here to reset