Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval

by   Ling Xiao, et al.

This paper proposes an attribute-guided multi-level attention network (AG-MLAN) to learn fine-grained fashion similarity. AG-MLAN is able to make a more accurate attribute positioning and capture more discriminative features under the guidance of the specified attribute. Specifically, the AG-MLAN contains two branches, branch 1 aims to force the model to recognize different attributes, while branch 2 aims to learn multiple attribute-specific embedding spaces for measuring the fine-grained similarity. We first improve the Convolutional Neural Network (CNN) backbone to extract hierarchical feature representations, then the extracted feature representations are passed into branch 1 for attribute classification and branch 2 for multi-level feature extraction. In branch 2, we first propose a multi-level attention module to extract a more discriminative representation under the guidance of a specific attribute. Then, we adopt a masked embedding module to learn attribute-aware embedding. Finally, the AG-MLAN is trained with a weighted loss of the classification loss in branch 1 and the triplet loss of the masked embedding features in branch 2 to further improve the accuracy in attribute location. Extensive experiments on the DeepFashion, FashionAI, and Zappos50k datasets show the effectiveness of AG-MLAN for fine-grained fashion similarity learning and its potential for attribute-guided retrieval tasks. The proposed AG-MLAN outperforms the state-of-the-art methods in the fine-grained fashion similarity retrieval task.


Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

This paper strives to predict fine-grained fashion similarity. In this s...

Embedding Label Structures for Fine-Grained Feature Representation

Recent algorithms in convolutional neural networks (CNN) considerably ad...

Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network

Many studies in vision tasks have aimed to create effective embedding sp...

Hierarchical Feature Embedding for Attribute Recognition

Attribute recognition is a crucial but challenging task due to viewpoint...

Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

Fashion products typically feature in compositions of a variety of style...

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

Attribute-specific fashion retrieval (ASFR) is a challenging information...

Attribute-Aware Attention Model for Fine-grained Representation Learning

How to learn a discriminative fine-grained representation is a key point...

Please sign up or login with your details

Forgot password? Click here to reset