The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

10/11/2022
by   Peter Kocsis, et al.
0

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to 16% validation accuracy in the supervised setting without adding any extra parameters during inference.

READ FULL TEXT
research
04/22/2019

Deep Anchored Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have been proven to be extremely su...
research
03/16/2021

Hebbian Semi-Supervised Learning in a Sample Efficiency Setting

We propose to address the issue of sample efficiency, in Deep Convolutio...
research
01/21/2019

Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification

The Convolutional Neural Networks (CNNs), in domains like computer visio...
research
03/10/2018

Attention-based Graph Neural Network for Semi-supervised Learning

Recently popularized graph neural networks achieve the state-of-the-art ...
research
06/08/2016

Convolution by Evolution: Differentiable Pattern Producing Networks

In this work we introduce a differentiable version of the Compositional ...
research
06/08/2021

Householder-Absolute Neural Layers For High Variability and Deep Trainability

We propose a new architecture for artificial neural networks called Hous...
research
07/12/2022

Utilizing Excess Resources in Training Neural Networks

In this work, we suggest Kernel Filtering Linear Overparameterization (K...

Please sign up or login with your details

Forgot password? Click here to reset