Contrastive Visual-Linguistic Pretraining

07/26/2020
by   Lei Shi, et al.
10

Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label problems, based on the visual features having been pretrained on the Visual Genome dataset. To overcome these issues, we propose unbiased Contrastive Visual-Linguistic Pretraining (CVLP), which constructs a visual self-supervised loss built upon contrastive learning. We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning. Our code is available at: https://github.com/ArcherYunDong/CVLP-.

READ FULL TEXT

page 2

page 8

research
09/24/2021

Dense Contrastive Visual-Linguistic Pretraining

Inspired by the success of BERT, several multimodal representation learn...
research
09/17/2020

MoPro: Webly Supervised Learning with Momentum Prototypes

We propose a webly-supervised representation learning method that does n...
research
02/05/2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Mainstream 3D representation learning approaches are built upon contrast...
research
08/24/2023

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

The multimedia community has shown a significant interest in perceiving ...
research
04/27/2022

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Current contrastive learning frameworks focus on leveraging a single sup...
research
06/22/2022

Prototypical Contrastive Language Image Pretraining

Contrastive Language Image Pretraining (CLIP) received widespread attent...
research
05/26/2023

Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering

The nodes in the commonsense knowledge graph (CSKG) are normally represe...

Please sign up or login with your details

Forgot password? Click here to reset