Is Object Detection Necessary for Human-Object Interaction Recognition?

07/27/2021
by   Ying Jin, et al.
0

This paper revisits human-object interaction (HOI) recognition at image level without using supervisions of object location and human pose. We name it detection-free HOI recognition, in contrast to the existing detection-supervised approaches which rely on object and keypoint detections to achieve state of the art. With our method, not only the detection supervision is evitable, but superior performance can be achieved by properly using image-text pre-training (such as CLIP) and the proposed Log-Sum-Exp Sign (LSE-Sign) loss function. Specifically, using text embeddings of class labels to initialize the linear classifier is essential for leveraging the CLIP pre-trained image encoder. In addition, LSE-Sign loss facilitates learning from multiple labels on an imbalanced dataset by normalizing gradients over all classes in a softmax format. Surprisingly, our detection-free solution achieves 60.5 mAP on the HICO dataset, outperforming the detection-supervised state of the art by 13.4 mAP

READ FULL TEXT

page 4

page 5

research
12/13/2021

Decoupling Object Detection from Human-Object Interaction Recognition

We propose DEFR, a DEtection-FRee method to recognize Human-Object Inter...
research
03/10/2022

The Overlooked Classifier in Human-Object Interaction Recognition

Human-Object Interaction (HOI) recognition is challenging due to two fac...
research
09/02/2023

Self-Supervised Video Transformers for Isolated Sign Language Recognition

This paper presents an in-depth analysis of various self-supervision met...
research
04/14/2021

HoughNet: Integrating near and long-range evidence for visual detection

This paper presents HoughNet, a one-stage, anchor-free, voting-based, bo...
research
07/23/2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Learning to localize and name object instances is a fundamental problem ...
research
05/25/2022

Location-free Human Pose Estimation

Human pose estimation (HPE) usually requires large-scale training data t...
research
06/22/2021

Confidence-Aware Learning for Camouflaged Object Detection

Confidence-aware learning is proven as an effective solution to prevent ...

Please sign up or login with your details

Forgot password? Click here to reset