Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

by   Sangmin Woo, et al.

In this work, we seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task. Quantitative and qualitative analysis of the Visual Genome dataset implies – 1) Ambiguity: even if inter-object relationship contains the same object (or predicate), they may not be visually or semantically similar, 2) Asymmetry: despite the nature of the relationship that embodied the direction, it was not well addressed in previous studies, and 3) Higher-order contexts: leveraging the identities of certain graph elements can help to generate accurate scene graphs. Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN). Locally, interactions extract the essence between three instances - subject, object, and background - while baking direction awareness into the network by constraining the input order. Globally, interactions encode the contexts between every graph components – nodes and edges. Also we introduce Attract Repel loss which finely adjusts predicate embeddings. Our framework enables predicting the scene graph in a local-to-global manner by design, leveraging the possible complementariness. To quantify how much LOGIN is aware of relational direction, we propose a new diagnostic task called Bidirectional Relationship Classification (BRC). We see that LOGIN can successfully distinguish relational direction than existing methods (in BRC task) while showing state-of-the-art results on the Visual Genome benchmark (in SGG task).


page 1

page 3

page 5

page 10

page 13


LinkNet: Relational Embedding for Scene Graph

Objects and their relationships are critical contents for image understa...

Explaining Local, Global, And Higher-Order Interactions In Deep Learning

We present a simple yet highly generalizable method for explaining inter...

Scene Graph Generation via Conditional Random Fields

Despite the great success object detection and segmentation models have ...

Neural Motifs: Scene Graph Parsing with Global Context

We investigate the problem of producing structured graph representations...

A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval

Conventional approaches to image-text retrieval mainly focus on indexing...

DigNet: Digging Clues from Local-Global Interactive Graph for Aspect-level Sentiment Classification

In aspect-level sentiment classification (ASC), state-of-the-art models ...

Graph Transformer GANs for Graph-Constrained House Generation

We present a novel graph Transformer generative adversarial network (GTG...

Please sign up or login with your details

Forgot password? Click here to reset