Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

02/10/2022
by   Nan Wu, et al.
0

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

READ FULL TEXT
research
03/23/2022

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice,...
research
09/12/2023

Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation

One primary topic of multi-modal learning is to jointly incorporate hete...
research
04/11/2023

Investigating Imbalances Between SAR and Optical Utilization for Multi-Modal Urban Mapping

Accurate urban maps provide essential information to support sustainable...
research
11/28/2022

Pitfalls of Conditional Batch Normalization for Contextual Multi-Modal Learning

Humans have perfected the art of learning from multiple modalities throu...
research
12/31/2014

ModDrop: adaptive multi-modal gesture recognition

We present a method for gesture detection and localisation based on mult...
research
03/06/2016

Variational methods for Conditional Multimodal Deep Learning

In this paper, we address the problem of conditional modality learning, ...
research
10/21/2020

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

Many recent datasets contain a variety of different data modalities, for...

Please sign up or login with your details

Forgot password? Click here to reset