Continual learning in cross-modal retrieval

04/14/2021
by   Kai Wang, et al.
0

Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and integrated (we focus on cross-modal retrieval between language and visual representations). The latter studies how to prevent forgetting a previously learned task when learning a new one. While humans excel in these two aspects, deep neural networks are still quite limited. In this paper, we propose a combination of both problems into a continual cross-modal retrieval setting, where we study how the catastrophic interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval. We propose a general framework that decouples the training, indexing and querying stages. We also identify and study different factors that may lead to forgetting, and propose tools to alleviate it. We found that the indexing stage pays an important role and that simply avoiding reindexing the database with updated embedding networks can lead to significant gains. We evaluated our methods in two image-text retrieval datasets, obtaining significant gains with respect to the fine tuning baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2016

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natu...
research
05/11/2023

Continual Vision-Language Representation Learning with Off-Diagonal Information

This paper discusses the feasibility of continuously training the CLIP m...
research
06/18/2022

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Current state-of-the-art vision-and-language models are evaluated on tas...
research
10/20/2021

VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal Retrieval

Cross-model retrieval has emerged as one of the most important upgrades ...
research
12/19/2022

DSI++: Updating Transformer Memory with New Documents

Differentiable Search Indices (DSIs) encode a corpus of documents in the...
research
08/16/2023

Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation

Continual learning refers to the capability of a machine learning model ...
research
11/26/2021

Emotion Embedding Spaces for Matching Music to Stories

Content creators often use music to enhance their stories, as it can be ...

Please sign up or login with your details

Forgot password? Click here to reset