UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations

04/11/2019
by   Hao Wu, et al.
0

We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts. The space unifies the concepts at different levels, including objects, attributes, relations, and full scenes. A contrastive learning approach is proposed for the fine-grained alignment from only image-caption pairs. Moreover, we present an effective approach for enforcing the coverage of semantic components that appear in the sentence. We demonstrate the robustness of Unified VSE in defending text-domain adversarial attacks on cross-modal retrieval tasks. Such robustness also empowers the use of visual cues to resolve word dependencies in novel sentences.

READ FULL TEXT
research
04/11/2019

Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations

We propose the Unified Visual-Semantic Embeddings (Unified VSE) for lear...
research
03/01/2020

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Cross-modal retrieval between videos and texts has attracted growing att...
research
03/27/2023

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Contrastive learning-based vision-language pre-training approaches, such...
research
08/18/2022

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

Text-based person retrieval aims to find the query person based on a tex...
research
05/20/2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Text-video retrieval is a challenging cross-modal task, which aims to al...
research
06/27/2018

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples

We study the problem of grounding distributional representations of text...
research
05/13/2022

Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering

A key challenge in video question answering is how to realize the cross-...

Please sign up or login with your details

Forgot password? Click here to reset