Label2Label: A Language Modeling Framework for Multi-Attribute Learning

07/18/2022
by   Wanhua Li, et al.
0

Objects are usually associated with multiple attributes, and these attributes often exhibit high correlations. Modeling complex relationships between attributes poses a great challenge for multi-attribute learning. This paper proposes a simple yet generic framework named Label2Label to exploit the complex attribute correlations. Label2Label is the first attempt for multi-attribute prediction from the perspective of language modeling. Specifically, it treats each attribute label as a "word" describing the sample. As each sample is annotated with multiple attribute labels, these "words" will naturally form an unordered but meaningful "sentence", which depicts the semantic information of the corresponding sample. Inspired by the remarkable success of pre-training language models in NLP, Label2Label introduces an image-conditioned masked language model, which randomly masks some of the "word" tokens from the label "sentence" and aims to recover them based on the masked "sentence" and the context conveyed by image features. Our intuition is that the instance-wise attribute relations are well grasped if the neural net can infer the missing attributes based on the context and the remaining attribute hints. Label2Label is conceptually simple and empirically powerful. Without incorporating task-specific prior knowledge and highly specialized network designs, our approach achieves state-of-the-art results on three different multi-attribute learning tasks, compared to highly customized domain-specific methods. Code is available at https://github.com/Li-Wanhua/Label2Label.

READ FULL TEXT

page 2

page 16

research
10/21/2021

Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing

Sentence-based Image Editing (SIE) aims to deploy natural language to ed...
research
07/01/2017

SAM: Semantic Attribute Modulation for Language Modeling and Style Variation

This paper presents a Semantic Attribute Modulation (SAM) for language m...
research
09/22/2021

Pix2seq: A Language Modeling Framework for Object Detection

This paper presents Pix2Seq, a simple and generic framework for object d...
research
10/09/2021

Learning Single/Multi-Attribute of Object with Symmetry and Group

Attributes and objects can compose diverse compositions. To model the co...
research
07/05/2023

PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

This paper describes PULSAR, our system submission at the ImageClef 2023...
research
08/09/2019

Recognizing Part Attributes with Insufficient Data

Recognizing attributes of objects and their parts is important to many c...
research
04/15/2022

Text Revision by On-the-Fly Representation Optimization

Text revision refers to a family of natural language generation tasks, w...

Please sign up or login with your details

Forgot password? Click here to reset