PopulAtion Parameter Averaging (PAPA)

Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights (model soups). However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when weights are similar enough (in weight or feature space) to average well but different enough to benefit from combining them. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while occasionally (not too often, not too rarely) replacing the weights of the networks with the population average of the weights. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 1.1 CIFAR-10, 2.4 independent (non-averaged) models.


page 1

page 2

page 3

page 4


Hierarchical Weight Averaging for Deep Neural Networks

Despite the simplicity, stochastic gradient descent (SGD)-like algorithm...

Diverse Weight Averaging for Out-of-Distribution Generalization

Standard neural networks struggle to generalize under distribution shift...

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

Lottery tickets (LTs) is able to discover accurate and sparse subnetwork...

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Training vision or language models on large datasets can take days, if n...

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm ...

Anytime Tail Averaging

Tail averaging consists in averaging the last examples in a stream. Comm...

Code Repositories


Repository for the PopulAtion Parameter Averaging (PAPA) paper

view repo

Please sign up or login with your details

Forgot password? Click here to reset