Human-Object Interaction Detection via Disentangled Transformer

04/20/2022
by   Desen Zhou, et al.
0

Human-Object Interaction Detection tackles the problem of joint localization and classification of human object interactions. Existing HOI transformers either adopt a single decoder for triplet prediction, or utilize two parallel decoders to detect individual objects and interactions separately, and compose triplets by a matching process. In contrast, we decouple the triplet prediction into human-object pair detection and interaction classification. Our main motivation is that detecting the human-object instances and classifying interactions accurately needs to learn representations that focus on different regions. To this end, we present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks. To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder. Extensive experiments show that our method outperforms prior work on two public HOI benchmarks by a sizeable margin. Code will be available.

READ FULL TEXT

page 1

page 3

page 8

research
04/28/2021

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Human-Object Interaction (HOI) detection is a task of identifying "a set...
research
04/11/2023

Relational Context Learning for Human-Object Interaction Detection

Recent state-of-the-art methods for HOI detection typically build on tra...
research
10/12/2016

Deep disentangled representations for volumetric reconstruction

We introduce a convolutional neural network for inferring a compact dise...
research
03/28/2022

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is the task of identifying a se...
research
01/09/2023

Parallel Reasoning Network for Human-Object Interaction Detection

Human-Object Interaction (HOI) detection aims to learn how human interac...
research
03/14/2022

Disentangled Representation Learning for Text-Video Retrieval

Cross-modality interaction is a critical component in Text-Video Retriev...
research
12/30/2019

PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection

We propose a single-stage Human-Object Interaction (HOI) detection metho...

Please sign up or login with your details

Forgot password? Click here to reset