One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction

by   Wentao He, et al.

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable effort in developing a system which could automatically solve the RPM problem, often through a black-box end-to-end Convolutional Neural Network (CNN) for both visual recognition and logical reasoning tasks. Towards the objective of developing a highly explainable solution, we propose a One-shot Human-Understandable ReaSoner (Os-HURS), which is a two-step framework including a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we propose a "2+1" formulation that can be better understood by humans and significantly reduces the model complexity. As a result, a precise reasoning rule can be deduced from one RPM sample only, which is not feasible for existing solution methods. The proposed reasoning module is also capable of yielding a set of reasoning rules, precisely modeling the human knowledge in solving the RPM problem. To validate the proposed method on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames instead of synthetic images. Experimental results on various RPM-like datasets demonstrate that the proposed Os-HURS achieves a significant and consistent performance gain compared with the state-of-the-art models.


Progressive Reasoning by Module Composition

Humans learn to solve tasks of increasing complexity by building on top ...

A Data Augmentation Method by Mixing Up Negative Candidate Answers for Solving Raven's Progressive Matrices

Raven's Progressive Matrices (RPMs) are frequently-used in testing human...

Few-shot Visual Reasoning with Meta-analogical Contrastive Learning

While humans can solve a visual puzzle that requires logical reasoning b...

GAMR: A Guided Attention Model for (visual) Reasoning

Humans continue to outperform modern AI systems in their ability to flex...

Analyzing Recurrent Neural Network by Probabilistic Abstraction

Neural network is becoming the dominant approach for solving many real-w...

Visual Reasoning: from State to Transformation

Most existing visual reasoning tasks, such as CLEVR in VQA, ignore an im...

Transformation Driven Visual Reasoning

This paper defines a new visual reasoning paradigm by introducing an imp...

Please sign up or login with your details

Forgot password? Click here to reset