A Corpus for Reasoning About Natural Language Grounded in Photographs

11/01/2018
by   Alane Suhr, et al.
2

We introduce a new dataset for joint reasoning about language and vision. The data contains 107,296 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a photograph. We present an approach for finding visually complex images and crowdsourcing linguistically diverse captions. Qualitative analysis shows the data requires complex reasoning about quantities, comparisons, and relationships between objects. Evaluation of state-of-the-art visual reasoning methods shows the data is a challenge for current methods.

READ FULL TEXT

page 1

page 3

page 4

page 15

page 16

research
11/29/2018

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

We study the problem of jointly reasoning about language and vision thro...
research
10/02/2017

Visual Reasoning with Natural Language

Natural language provides a widely accessible and expressive interface f...
research
04/18/2018

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g.,...
research
06/04/2019

How Large Are Lions? Inducing Distributions over Quantitative Attributes

Most current NLP systems have little knowledge about quantitative attrib...
research
05/24/2023

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Charts are very popular for analyzing data, visualizing key insights and...
research
08/28/2018

Mapping Natural Language Commands to Web Elements

The web provides a rich, open-domain environment with textual, structura...
research
03/31/2013

A cookbook of translating English to Xapi

The Xapagy cognitive architecture had been designed to perform narrative...

Please sign up or login with your details

Forgot password? Click here to reset