GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

03/26/2022
by   Yue Liao, et al.
0

The task of Human-Object Interaction (HOI) detection could be divided into two core problems, i.e., human-object association and interaction understanding. In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects. For the association, previous two-branch methods suffer from complex and costly post-matching, while single-branch methods ignore the features distinction in different tasks. We propose Guided-Embedding Network (GEN) to attain a two-branch pipeline without post-matching. In GEN, we design an instance decoder to detect humans and objects with two independent query sets and a position Guided Embedding (p-GE) to mark the human and object in the same position as a pair. Besides, we design an interaction decoder to classify interactions, where the interaction queries are made of instance Guided Embeddings (i-GE) generated from the outputs of each instance decoder layer. For the interaction understanding, previous methods suffer from long-tailed distribution and zero-shot discovery. This paper proposes a Visual-Linguistic Knowledge Transfer (VLKT) training strategy to enhance interaction understanding by transferring knowledge from a visual-linguistic pre-trained model CLIP. In specific, we extract text embeddings for all labels with CLIP to initialize the classifier and adopt a mimic loss to minimize the visual feature distance between GEN and CLIP. As a result, GEN-VLKT outperforms the state of the art by large margins on multiple datasets, e.g., +5.05 mAP on HICO-Det. The source codes are available at https://github.com/YueLiao/gen-vlkt.

READ FULL TEXT
research
03/28/2023

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Human-Object Interaction (HOI) detection aims to localize human-object p...
research
07/05/2023

Focusing on what to decode and what to train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising

Recent one-stage transformer-based methods achieve notable gains in the ...
research
01/25/2021

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is an important problem to unde...
research
07/12/2022

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is a core task for high-level i...
research
08/26/2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

Referring image segmentation aims to segment the target object referred ...
research
02/19/2022

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Human-Object Interaction (HOI) detection plays a core role in activity u...
research
12/14/2021

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Human-Object Interaction (HOI) detection is a fundamental task in high-l...

Please sign up or login with your details

Forgot password? Click here to reset