A Real-time Global Inference Network for One-stage Referring Expression Comprehension

12/07/2019
by   Yiyi Zhou, et al.
19

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description. Most existing REC methods follow a multi-stage pipeline, which are computationally expensive and greatly limit the application of REC. In this paper, we propose a one-stage model towards real-time REC, termed Real-time Global Inference Network (RealGIN). RealGIN addresses the diversity and complexity issues in REC with two innovative designs: the Adaptive Feature Selection (AFS) and the Global Attentive ReAsoNing unit (GARAN). AFS adaptively fuses features at different semantic levels to handle the varying content of expressions. GARAN uses the textual feature as a pivot to collect expression-related visual information from all regions, and thenselectively diffuse such information back to all regions, which provides sufficient context for modeling the complex linguistic conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIt and Flickr30k, the proposed RealGIN outperforms most prior works and achieves very competitive performances against the most advanced method, i.e., MAttNet. Most importantly, under the same hardware, RealGIN can boost the processing speed by about 10 times over the existing methods.

READ FULL TEXT

page 1

page 3

page 7

page 10

research
07/31/2022

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Referring Expression Comprehension (REC) is one of the most important ta...
research
12/09/2018

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

In this paper, we propose a novel end-to-end model, namely Single-Stage ...
research
01/09/2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding ...
research
03/12/2022

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

Referring expression comprehension (REC) aims to locate a certain object...
research
03/19/2020

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

Referring expression comprehension (REC) and segmentation (RES) are two ...
research
09/16/2019

A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

Referring expression comprehension aims to localize the object instance ...
research
08/31/2023

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

In 3D Referring Expression Segmentation (3D-RES), the earlier approach a...

Please sign up or login with your details

Forgot password? Click here to reset