Look, Remember and Reason: Visual Reasoning with Grounded Rationales

06/30/2023
by   Apratim Bhattacharyya, et al.
0

Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of “Look, Remember, Reason”: visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.

READ FULL TEXT

page 1

page 5

page 8

page 14

page 16

page 17

research
05/22/2023

Enhance Reasoning Ability of Visual-Language Models via Large Language Models

Pre-trained visual language models (VLM) have shown excellent performanc...
research
09/08/2023

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Vision-language models (VLMs) have recently demonstrated strong efficacy...
research
03/06/2023

PaLM-E: An Embodied Multimodal Language Model

Large language models excel at a wide range of complex tasks. However, e...
research
07/21/2023

How to Tidy Up a Table: Fusing Visual and Semantic Commonsense Reasoning for Robotic Tasks with Vague Objectives

Vague objectives in many real-life scenarios pose long-standing challeng...
research
07/10/2017

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-rela...
research
11/26/2020

Transformation Driven Visual Reasoning

This paper defines a new visual reasoning paradigm by introducing an imp...
research
05/31/2023

Let's Verify Step by Step

In recent years, large language models have greatly improved in their ab...

Please sign up or login with your details

Forgot password? Click here to reset