One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction

11/24/2021
by   Wentao He, et al.
0

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable effort in developing a system which could automatically solve the RPM problem, often through a black-box end-to-end Convolutional Neural Network (CNN) for both visual recognition and logical reasoning tasks. Towards the objective of developing a highly explainable solution, we propose a One-shot Human-Understandable ReaSoner (Os-HURS), which is a two-step framework including a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we propose a "2+1" formulation that can be better understood by humans and significantly reduces the model complexity. As a result, a precise reasoning rule can be deduced from one RPM sample only, which is not feasible for existing solution methods. The proposed reasoning module is also capable of yielding a set of reasoning rules, precisely modeling the human knowledge in solving the RPM problem. To validate the proposed method on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames instead of synthetic images. Experimental results on various RPM-like datasets demonstrate that the proposed Os-HURS achieves a significant and consistent performance gain compared with the state-of-the-art models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset