Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation

07/17/2020
by   Wenbin Wang, et al.
0

Scene graph aims to faithfully reveal humans' perception of image content. When humans analyze a scene, they usually prefer to describe image gist first, namely major objects and key relations in a scene graph. This humans' inherent perceptive habit implies that there exists a hierarchical structure about humans' preference during the scene parsing procedure. Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph. Concretely, a scene is represented by a human-mimetic Hierarchical Entity Tree (HET) consisting of a series of image regions. To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context to capture the structured information embedded in HET. To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings by learning to capture humans' subjective perceptive habits from objective entity saliency and size. Experiments indicate that our method not only achieves state-of-the-art performances for scene graph generation, but also is expert in mining image-specific relations which play a great role in serving downstream tasks.

READ FULL TEXT

page 2

page 5

page 15

page 20

page 25

page 26

research
08/01/2018

Graph R-CNN for Scene Graph Generation

We propose a novel scene graph generation model called Graph R-CNN, that...
research
04/07/2016

Geometric Scene Parsing with Hierarchical LSTM

This paper addresses the problem of geometric scene parsing, i.e. simult...
research
03/28/2023

HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

Panoptic Scene Graph generation (PSG) is a recently proposed task in ima...
research
12/05/2018

Learning to Compose Dynamic Tree Structures for Visual Contexts

We propose to compose dynamic tree structures that place the objects in ...
research
09/28/2020

Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Scene graph parsing aims to detect objects in an image scene and recogni...
research
12/16/2022

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

Most TextVQA approaches focus on the integration of objects, scene texts...
research
10/16/2015

Towards Reversible De-Identification in Video Sequences Using 3D Avatars and Steganography

We propose a de-identification pipeline that protects the privacy of hum...

Please sign up or login with your details

Forgot password? Click here to reset