Graphical Contrastive Losses for Scene Graph Generation

03/07/2019
by   Ji Zhang, et al.
10

Most scene graph generators use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e.g. multiple cups). The second, Proximal Relationship Ambiguity, arises when multiple subject-predicate-object triplets appear in close proximity with the same predicate, and the model struggles to infer the correct subject-object pairings (e.g. mis-pairing musicians and their instruments). We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph generation problem, collectively termed the Graphical Contrastive Losses. These losses explicitly force the model to disambiguate related and unrelated instances through margin constraints specific to each type of confusion. We further construct a relationship detector, called RelDN, using the aforementioned pipeline to demonstrate the efficacy of our proposed losses. Our model outperforms the winning method of the OpenImages Relationship Detection Challenge by 4.7% (16.5% relative) on the test set. We also show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.

READ FULL TEXT

page 1

page 2

page 5

page 7

page 11

page 12

research
01/27/2022

RelTR: Relation Transformer for Scene Graph Generation

Different objects in the same scene are more or less related to each oth...
research
06/09/2023

Single-Stage Visual Relationship Learning using Conditional Queries

Research in scene graph generation (SGG) usually considers two-stage mod...
research
03/30/2021

Fully Convolutional Scene Graph Generation

This paper presents a fully convolutional scene graph generation (FCSGG)...
research
01/18/2023

DDS: Decoupled Dynamic Scene-Graph Generation Network

Scene-graph generation involves creating a structural representation of ...
research
11/09/2022

SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation

Scene Graph Generation (SGG) serves a comprehensive representation of th...
research
04/17/2020

CPARR: Category-based Proposal Analysis for Referring Relationships

The task of referring relationships is to localize subject and object en...
research
09/28/2011

A Visual Entity-Relationship Model for Constraint-Based University Timetabling

University timetabling (UTT) is a complex problem due to its combinatori...

Please sign up or login with your details

Forgot password? Click here to reset