Meta Module Network for Compositional Visual Reasoning

by   Wenhu Chen, et al.

There are two main lines of research on visual reasoning: neural module network (NMN) with explicit multi-hop reasoning through handcrafted neural modules, and monolithic network with implicit reasoning in the latent feature space. The former excels in interpretability and compositionality, while the latter usually achieves better performance due to model flexibility and parameter efficiency. In order to bridge the gap of the two, we present Meta Module Network (MMN), a novel hybrid approach that can efficiently utilize a Meta Module to perform versatile functionalities, while preserving compositionality and interpretability through modularized design. The proposed model first parses an input question into a functional program through a Program Generator. Instead of handcrafting a task-specific network to represent each function like traditional NMN, we use Recipe Encoder to translate the functions into their corresponding recipes (specifications), which are used to dynamically instantiate the Meta Module into Instance Modules. To endow different instance modules with designated functionality, a Teacher-Student framework is proposed, where a symbolic teacher pre-executes against the scene graphs to provide guidelines for the instantiated modules (student) to follow. In a nutshell, MMN adopts the meta module to increase its parameterization efficiency and uses recipe encoding to improve its generalization ability over NMN. Experiments conducted on the GQA benchmark demonstrates that: (i) MMN achieves significant improvement over both NMN and monolithic network baselines; (ii) MMN is able to generalize to unseen but related functions.


Interpretable Neural Computation for Real-World Compositional Visual Question Answering

There are two main lines of research on visual question answering (VQA):...

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

In order to achieve a general visual question answering (VQA) system, it...

Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers

This paper presents ReasonFormer, a unified reasoning framework for mirr...

Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning

Multi-hop QA requires a model to connect multiple pieces of evidence sca...

Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation

Despite the success of neural models in solving reasoning tasks, their c...

Progressive Reasoning by Module Composition

Humans learn to solve tasks of increasing complexity by building on top ...

Neural-Symbolic Integration: A Compositional Perspective

Despite significant progress in the development of neural-symbolic frame...

Please sign up or login with your details

Forgot password? Click here to reset