Jonathan Frankle

research

∙ 05/24/2023

Dynamic Masking Rate Schedules for MLM Pretraining

Most works on transformers trained with the Masked Language Modeling (ML...

0 Zachary Ankner, et al. ∙

research

∙ 03/11/2023

Knowledge Distillation for Efficient Sequences of Training Runs

In many practical scenarios – like hyperparameter search or continual re...

0 Xingyu Liu, et al. ∙

research

∙ 12/01/2022

The Effect of Data Dimensionality on Neural Network Prunability

Practitioners prune neural networks for efficiency gains and generalizat...

0 Zachary Ankner, et al. ∙

research

∙ 11/01/2022

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

Methods for improving the efficiency of deep network training (i.e. the ...

0 Cody Blakeney, et al. ∙

research

∙ 10/25/2022

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Practitioners frequently observe that pruning improves model generalizat...

0 Tian Jin, et al. ∙

research

∙ 10/06/2022

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Modern deep learning involves training costly, highly overparameterized ...

12 Mansheej Paul, et al. ∙

research

∙ 06/23/2022

Non-Determinism and the Lawlessness of ML Code

Legal literature on machine learning (ML) tends to focus on harms, and a...

0 A. Feder Cooper, et al. ∙

research

∙ 06/02/2022

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

A striking observation about iterative magnitude pruning (IMP; Frankle e...

0 Mansheej Paul, et al. ∙

research

∙ 06/02/2022

Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates

Benchmarking the tradeoff between neural network accuracy and training t...

0 Jacob Portes, et al. ∙

research

∙ 04/18/2022

Strengthening Subcommunities: Towards Sustainable Growth in AI Research

AI's rapid growth has been felt acutely by scholarly venues, leading to ...

2 Andi Peng, et al. ∙

research

∙ 10/15/2021

Trade-offs of Local SGD at Scale: An Empirical Study

As datasets and models become increasingly large, distributed training h...

0 Jose Javier Gonzalez Ortiz, et al. ∙

research

∙ 06/30/2021

What can linear interpolation of neural network loss landscapes tell us?

Studying neural network loss landscapes provides insights into the natur...

0 Tiffany Vlaar, et al. ∙

research

∙ 04/30/2021

Studying the Consistency and Composability of Lottery Ticket Pruning Masks

Magnitude pruning is a common, effective technique to identify sparse su...

0 Rajiv Movva, et al. ∙

research

∙ 12/12/2020

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

The computer vision world has been re-gaining enthusiasm in various pre-...

9 Tianlong Chen, et al. ∙

research

∙ 12/12/2020

Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"

We revisit and extend the experiments of Goodfellow et al. (2014), who s...

0 Jonathan Frankle, et al. ∙

research

∙ 10/13/2020

Are all negatives created equal in contrastive instance discrimination?

Self-supervised learning has recently begun to rival supervised learning...

0 Tiffany, et al. ∙

research

∙ 09/18/2020

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

Recent work has explored the possibility of pruning neural networks at i...

9 Jonathan Frankle, et al. ∙

research

∙ 07/23/2020

The Lottery Ticket Hypothesis for Pre-trained BERT Networks

In natural language processing (NLP), enormous pre-trained models like B...

7 Tianlong Chen, et al. ∙

research

∙ 06/18/2020

On the Predictability of Pruning Across Scales

We show that the error of magnitude-pruned networks follows a scaling la...

0 Jonathan S. Rosenfeld, et al. ∙

research

∙ 03/06/2020

What is the State of Neural Network Pruning?

Neural network pruning—the task of reducing the size of a network by rem...

19 Davis Blalock, et al. ∙

research

∙ 03/05/2020

Comparing Rewinding and Fine-tuning in Neural Network Pruning

Many neural network pruning algorithms proceed in three steps: train the...

0 Alex Renda, et al. ∙

research

∙ 02/29/2020

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Batch normalization (BatchNorm) has become an indispensable tool for tra...

0 Jonathan Frankle, et al. ∙

research

∙ 02/24/2020

The Early Phase of Neural Network Training

Recent studies have shown that many important aspects of neural network ...

27 Jonathan Frankle, et al. ∙

research

∙ 12/11/2019

Linear Mode Connectivity and the Lottery Ticket Hypothesis

We introduce "instability analysis," a framework for assessing whether t...

13 Jonathan Frankle, et al. ∙

research

∙ 06/29/2019

Dissecting Pruned Neural Networks

Pruning is a standard technique for removing unnecessary structure from ...

0 Jonathan Frankle, et al. ∙

research

∙ 03/05/2019

The Lottery Ticket Hypothesis at Scale

Recent work on the "lottery ticket hypothesis" proposes that randomly-in...

8 Jonathan Frankle, et al. ∙

research

∙ 03/09/2018

The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks

Neural network compression techniques are able to reduce the parameter c...

0 Jonathan Frankle, et al. ∙

research

∙ 03/09/2018

The Lottery Ticket Hypothesis: Training Pruned Neural Networks

Recent work on neural network pruning indicates that, at training time, ...

0 Jonathan Frankle, et al. ∙

Jonathan Frankle

Featured Co-authors

Sign in with Google

Consider DeepAI Pro