"Is this an example image?" -- Predicting the Relative Abstractness Level of Image and Text

01/23/2019
by   Christian Otto, et al.
0

Successful multimodal search and retrieval requires the automatic understanding of semantic cross-modal relations, which, however, is still an open research problem. Previous work has suggested the metrics cross-modal mutual information and semantic correlation to model and predict cross-modal semantic relations of image and text. In this paper, we present an approach to predict the (cross-modal) relative abstractness level of a given image-text pair, that is whether the image is an abstraction of the text or vice versa. For this purpose, we introduce a new metric that captures this specific relationship between image and text at the Abstractness Level (ABS). We present a deep learning approach to predict this metric, which relies on an autoencoder architecture that allows us to significantly reduce the required amount of labeled training data. A comprehensive set of publicly available scientific documents has been gathered. Experimental results on a challenging test set demonstrate the feasibility of the approach.

READ FULL TEXT
research
06/20/2019

Understanding, Categorizing and Predicting Semantic Image-Text Relations

Two modalities are often used to convey information in a complementary a...
research
09/02/2021

AnANet: Modeling Association and Alignment for Cross-modal Correlation Classification

The explosive increase of multimodal data makes a great demand in many c...
research
03/21/2017

Cross-modal Deep Metric Learning with Multi-task Regularization

DNN-based cross-modal retrieval has become a research hotspot, by which ...
research
07/16/2020

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval

The abundance of multimodal data (e.g. social media posts) has inspired ...
research
04/06/2023

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Cross-modal retrieval methods are the preferred tool to search databases...
research
09/06/2022

Cross Modal Compression: Towards Human-comprehensible Semantic Compression

Traditional image/video compression aims to reduce the transmission/stor...
research
11/20/2022

How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation

Sarcasm generation has been investigated in previous studies by consider...

Please sign up or login with your details

Forgot password? Click here to reset