Logical Implications for Visual Question Answering Consistency

03/16/2023
by   Sergio Tascon-Morales, et al.
0

Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce model consistency. Instead, we propose a novel strategy intended to improve model performance by directly reducing logical inconsistencies. To do this, we introduce a new consistency loss term that can be used by a wide range of the VQA models and which relies on knowing the logical relation between pairs of questions and answers. While such information is typically not available in VQA datasets, we propose to infer these logical relations using a dedicated language model and use these in our proposed consistency loss function. We conduct extensive experiments on the VQA Introspect and DME datasets and show that our method brings improvements to state-of-the-art VQA models, while being robust across different architectures and settings.

READ FULL TEXT

page 4

page 7

page 8

research
02/19/2020

VQA-LOL: Visual Question Answering under the Lens of Logic

Logical connectives and their implications on the meaning of a natural l...
research
07/08/2020

IQ-VQA: Intelligent Visual Question Answering

Even though there has been tremendous progress in the field of Visual Qu...
research
11/19/2020

Logically Consistent Loss for Visual Question Answering

Given an image, a back-ground knowledge, and a set of questions about an...
research
06/27/2022

Consistency-preserving Visual Question Answering in Medical Imaging

Visual Question Answering (VQA) models take an image and a natural-langu...
research
09/10/2019

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

While models for Visual Question Answering (VQA) have steadily improved ...
research
07/26/2023

LOIS: Looking Out of Instance Semantics for Visual Question Answering

Visual question answering (VQA) has been intensively studied as a multim...
research
08/02/2017

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Visual question answering as recently proposed multimodal learning task ...

Please sign up or login with your details

Forgot password? Click here to reset