Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

by   Lu Yin, et al.

Lottery tickets (LTs) is able to discover accurate and sparse subnetworks that could be trained in isolation to match the performance of dense networks. Ensemble, in parallel, is one of the oldest time-proven tricks in machine learning to improve performance by combining the output of multiple independent models. However, the benefits of ensemble in the context of LTs will be diluted since ensemble does not directly lead to stronger sparse subnetworks, but leverages their predictions for a better decision. In this work, we first observe that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs. Encouraged by this observation, we further propose an alternative way to perform an 'ensemble' over the subnetworks identified by iterative magnitude pruning via a simple interpolating strategy. We call our method Lottery Pools. In contrast to the naive ensemble which brings no performance gains to each single subnetwork, Lottery Pools yields much stronger sparse subnetworks than the original LTs without requiring any extra training or inference cost. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that our method achieves significant performance gains in both, in-distribution and out-of-distribution scenarios. Impressively, evaluated with VGG-16 and ResNet-18, the produced sparse subnetworks outperform the original LTs by up to 1.88 2.36 dense-model up to 2.22


page 3

page 10


FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity

Recent works on sparse neural networks have demonstrated that it is poss...

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Random pruning is arguably the most naive way to attain sparsity in neur...

PopulAtion Parameter Averaging (PAPA)

Ensemble methods combine the predictions of multiple models to improve p...

How Well Do Sparse Imagenet Models Transfer?

Transfer learning is a classic paradigm by which models pretrained on la...

SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning

Given the ever-increasing size of modern neural networks, the significan...

Boost Neural Networks by Checkpoints

Training multiple deep neural networks (DNNs) and averaging their output...

Training independent subnetworks for robust prediction

Recent approaches to efficiently ensemble neural networks have shown tha...

Please sign up or login with your details

Forgot password? Click here to reset