Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

02/24/2022
by   Liangsheng Lu, et al.
6

Affordance detection, which refers to perceiving objects with potential action possibilities in images, is a challenging task since the possible affordance depends on the person's purpose in real-world application scenarios. The existing works mainly extract the inherent human-object dependencies from image/video to accommodate affordance properties that change dynamically. In this paper, we explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem,i.e., given a set of phrases describing the action purposes, all the object regions in a scene with the same affordance should be detected. To this end, we propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively. Specifically, the presented CBCE-Net consists of a mutual guided vision-language module that updates the common features of vision and language in a progressive manner, and a cyclic interaction module (CIM) that facilitates the perception of possible interaction with objects in a cyclic manner. In addition, we extend the public Purpose-driven Affordance Dataset (PAD) by annotating affordance categories with short phrases. The contrastive experimental results demonstrate the superiority of our method over nine typical methods from four relevant fields in terms of both objective metrics and visual quality. The related code and dataset will be released at <https://github.com/lulsheng/CBCE-Net>.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

page 10

page 15

research
08/08/2021

One-Shot Object Affordance Detection in the Wild

Affordance detection refers to identifying the potential action possibil...
research
06/28/2021

One-Shot Affordance Detection

Affordance detection refers to identifying the potential action possibil...
research
10/01/2020

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

The task of video object segmentation with referring expressions (langua...
research
08/07/2020

Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is important to human-centric s...
research
08/12/2021

Learning Visual Affordance Grounding from Demonstration Videos

Visual affordance grounding aims to segment all possible interaction reg...
research
04/17/2023

ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection

Human-Object Interaction (HOI) detection, which localizes and infers rel...
research
10/21/2022

Describing Sets of Images with Textual-PCA

We seek to semantically describe a set of images, capturing both the att...

Please sign up or login with your details

Forgot password? Click here to reset