Dual Relation Alignment for Composed Image Retrieval

09/05/2023
by   Xintong Jiang, et al.
0

Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance.

READ FULL TEXT
research
03/15/2022

ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity

An intuitive way to search for images is to use queries composed of an e...
research
03/29/2023

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

Composed image retrieval searches for a target image based on a multi-mo...
research
05/17/2023

Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

The composed image retrieval (CIR) task aims to retrieve the desired tar...
research
07/09/2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by...
research
09/04/2023

Target-Guided Composed Image Retrieval

Composed image retrieval (CIR) is a new and flexible image retrieval par...
research
05/25/2023

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Composed image retrieval aims to find an image that best matches a given...
research
08/16/2023

Ranking-aware Uncertainty for Text-guided Image Retrieval

Text-guided image retrieval is to incorporate conditional text to better...

Please sign up or login with your details

Forgot password? Click here to reset