Relationformer: A Unified Framework for Image-to-Graph Generation

03/19/2022
by   Suprosanna, et al.
10

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach's effectiveness and generalizability.

READ FULL TEXT

page 10

page 16

page 19

page 20

page 21

research
03/07/2023

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Motion, scene and object are three primary visual components of a video....
research
07/12/2021

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

Identifying objects in an image and their mutual relationships as a scen...
research
02/20/2022

3DRM:Pair-wise relation module for 3D object detection

Context has proven to be one of the most important factors in object lay...
research
02/06/2023

1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop

Panoptic Scene Graph (PSG) generation aims to generate scene graph repre...
research
06/21/2021

Structured Sparse R-CNN for Direct Scene Graph Generation

Scene graph generation (SGG) is to detect entity pairs with their relati...
research
04/03/2021

Mutual Graph Learning for Camouflaged Object Detection

Automatically detecting/segmenting object(s) that blend in with their su...
research
03/11/2023

TransMatting: Tri-token Equipped Transformer Model for Image Matting

Image matting aims to predict alpha values of elaborate uncertainty area...

Please sign up or login with your details

Forgot password? Click here to reset