SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

by   Mojtaba Valipour, et al.

As the size of deep learning models continues to grow, finding optimal models under memory and computation constraints becomes increasingly more important. Although usually the architecture and constituent building blocks of neural networks allow them to be used in a modular way, their training process is not aware of this modularity. Consequently, conventional neural network training lacks the flexibility to adapt the computational load of the model during inference. This paper proposes SortedNet, a generalized and scalable solution to harness the inherent modularity of deep neural networks across various dimensions for efficient dynamic inference. Our training considers a nested architecture for the sub-models with shared parameters and trains them together with the main model in a sorted and probabilistic manner. This sorted training of sub-networks enables us to scale the number of sub-networks to hundreds using a single round of training. We utilize a novel updating scheme during training that combines random sampling of sub-networks with gradient accumulation to improve training efficiency. Furthermore, the sorted nature of our training leads to a search-free sub-network selection at inference time; and the nested architecture of the resulting sub-networks leads to minimal storage requirement and efficient switching between sub-networks at inference. Our general dynamic training approach is demonstrated across various architectures and tasks, including large language models and pre-trained vision models. Experimental results show the efficacy of the proposed approach in achieving efficient sub-networks while outperforming state-of-the-art dynamic training approaches. Our findings demonstrate the feasibility of training up to 160 different sub-models simultaneously, showcasing the extensive scalability of our proposed method while maintaining 96


page 5

page 11


Reducing the Training Time of Neural Networks by Partitioning

This paper presents a new method for pre-training neural networks that c...

DRESS: Dynamic REal-time Sparse Subnets

The limited and dynamically varied resources on edge devices motivate us...

Rethinking FUN: Frequency-Domain Utilization Networks

The search for efficient neural network architectures has gained much fo...

A Hardware-Aware System for Accelerating Deep Neural Network Optimization

Recent advances in Neural Architecture Search (NAS) which extract specia...

AdaSelection: Accelerating Deep Learning Training through Data Subsampling

In this paper, we introduce AdaSelection, an adaptive sub-sampling metho...

TIPS: Topologically Important Path Sampling for Anytime Neural Networks

Anytime neural networks (AnytimeNNs) are a promising solution to adaptiv...

Please sign up or login with your details

Forgot password? Click here to reset