Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

04/12/2020
by   Kento Terao, et al.
7

We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10 increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (i.e. the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms. Clustering results are available online at https://github.com/tttamaki/vqd .

READ FULL TEXT

page 2

page 5

page 7

page 8

research
07/06/2021

SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based Approach

Stack Overflow (SO) platform has a huge dataset of questions and answers...
research
02/19/2020

VQA-LOL: Visual Question Answering under the Lens of Logic

Logical connectives and their implications on the meaning of a natural l...
research
04/10/2020

Rephrasing visual questions by specifying the entropy of the answer distribution

Visual question answering (VQA) is a task of answering a visual question...
research
08/02/2017

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Visual question answering as recently proposed multimodal learning task ...
research
05/23/2017

How hard can it be? Estimating the difficulty of visual search in an image

We address the problem of estimating image difficulty defined as the hum...
research
11/20/2018

VQA with no questions-answers training

Methods for teaching machines to answer visual questions have made signi...

Please sign up or login with your details

Forgot password? Click here to reset