GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most BERT-Pruning Methods

10/12/2022
by   Eldar Kurtic, et al.
0

We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models, focusing on the classic BERT benchmark on various popular tasks. Despite existing evidence in the literature that GMP performs poorly, we show that a simple and general variant, which we call GMP*, can match and sometimes outperform more complex state-of-the-art methods. Our results provide a simple yet strong baseline for future work, highlight the importance of parameter tuning for baselines, and even improve the performance of the state-of-the-art second-order pruning method in this setting.

READ FULL TEXT
research
10/15/2020

A Deeper Look at the Layerwise Sparsity of Magnitude-based Pruning

Recent discoveries on neural network pruning reveal that, with a careful...
research
10/10/2019

Structured Pruning of Large Language Models

Large language models have recently achieved state of the art performanc...
research
09/29/2022

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Pruning neural networks has become popular in the last decade when it wa...
research
06/20/2023

A Simple and Effective Pruning Approach for Large Language Models

As their size increases, Large Languages Models (LLMs) are natural candi...
research
11/26/2021

How Well Do Sparse Imagenet Models Transfer?

Transfer learning is a classic paradigm by which models pretrained on la...
research
09/30/2020

AUBER: Automated BERT Regularization

How can we effectively regularize BERT? Although BERT proves its effecti...
research
10/18/2021

BERMo: What can BERT learn from ELMo?

We propose BERMo, an architectural modification to BERT, which makes pre...

Please sign up or login with your details

Forgot password? Click here to reset