Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

10/25/2022
by   Mathias Parger, et al.
0

Training a sparse neural network from scratch requires optimizing connections at the same time as the weights themselves. Typically, the weights are redistributed after a predefined number of weight updates, removing a fraction of the parameters of each layer and inserting them at different locations in the same layers. The density of each layer is determined using heuristics, often purely based on the size of the parameter tensor. While the connections per layer are optimized multiple times during training, the density of each layer typically remains constant. This leaves great unrealized potential, especially in scenarios with a high sparsity of 90 Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most. Our evaluation shows that our approach is less prone to unbalanced weight distribution at initialization than previous work and that it is able to find better performing sparse subnetworks at very high sparsity levels.

READ FULL TEXT
research
10/01/2021

Powerpropagation: A sparsity inducing weight reparameterisation

The training of sparse neural networks is becoming an increasingly impor...
research
07/10/2019

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelera...
research
02/22/2023

Considering Layerwise Importance in the Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) showed that by iteratively training ...
research
01/30/2022

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

Network sparsity receives popularity mostly due to its capability to red...
research
06/02/2023

MLP-Mixer as a Wide and Sparse MLP

Multi-layer perceptron (MLP) is a fundamental component of deep learning...
research
09/30/2022

Sparse Random Networks for Communication-Efficient Federated Learning

One main challenge in federated learning is the large communication cost...
research
12/19/2019

Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker

This paper explores the scenarios under which an attacker can claim that...

Please sign up or login with your details

Forgot password? Click here to reset