No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques

11/14/2018
by   Tanmay Gupta, et al.
6

We show that with an appropriate factorization, and encodings of layout and appearance constructed from outputs of pretrained object detectors, a relatively simple model outperforms more sophisticated approaches on human-object interaction detection. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (i) eliminating train-inference mismatch; (ii) rejecting easy negatives during mini-batch training; and (iii) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches while constructing training mini-batches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.

READ FULL TEXT

page 1

page 6

page 7

research
12/02/2020

Holistic 3D Human and Scene Mesh Estimation from Single View Images

The 3D world limits the human body pose and the human body pose conveys ...
research
11/20/2017

MegDet: A Large Mini-Batch Object Detector

The improvements in recent CNN-based object detection works, from R-CNN ...
research
04/09/2020

Spatial Priming for Detecting Human-Object Interactions

The relative spatial layout of a human and an object is an important cue...
research
08/26/2020

DRG: Dual Relation Graph for Human-Object Interaction Detection

We tackle the challenging problem of human-object interaction (HOI) dete...
research
05/26/2020

Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

Human pose transfer (HPT) is an emerging research topic with huge potent...

Please sign up or login with your details

Forgot password? Click here to reset