Deconfounded Image Captioning: A Causal Retrospect

03/09/2020
by   Xu Yang, et al.
0

The dataset bias in vision-language tasks is becoming one of the main problems that hinder the progress of our community. However, recent studies lack a principled analysis of the bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the cause of the bias in image captioning, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us to review previous works and design the effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and achieves a single-model 130.7 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS-COCO dataset, respectively. Last but not least, DICv1.0 is merely a natural derivation from our causal retrospect, which opens a promising direction for image captioning.

READ FULL TEXT

page 25

page 26

research
03/29/2022

Quantifying Societal Bias Amplification in Image Captioning

We study societal bias amplification in image captioning. Image captioni...
research
05/07/2023

UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese

Image Captioning is one of the vision-language tasks that still interest...
research
12/06/2018

Auto-Encoding Graphical Inductive Bias for Descriptive Image Captioning

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the languag...
research
12/06/2018

Auto-Encoding Scene Graphs for Image Captioning

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the languag...
research
10/04/2021

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Explaining an image with missing or non-existent objects is known as obj...
research
06/15/2020

Mitigating Gender Bias in Captioning Systems

Image captioning has made substantial progress with huge supporting imag...
research
08/13/2022

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Expansion methods explore the possibility of performance bottlenecks in ...

Please sign up or login with your details

Forgot password? Click here to reset