Revisiting Neural Scaling Laws in Language and Vision

09/13/2022
by   Ibrahim Alabdulmohsin, et al.
5

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting (interpolating) parameters. We then present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains, including image classification, neural machine translation (NMT) and language modeling, in addition to tasks from the BIG-Bench evaluation benchmark. Finally, we release a benchmark dataset comprising of 90 evaluation tasks to facilitate research in this domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2023

Scaling Laws for Multilingual Neural Machine Translation

In this work, we provide a large-scale empirical study of the scaling pr...
research
12/14/2022

Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...
research
02/04/2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and traini...
research
09/18/2017

Toward a full-scale neural machine translation in production: the Booking.com use case

While some remarkable progress has been made in neural machine translati...
research
07/15/2020

Dual Past and Future for Neural Machine Translation

Though remarkable successes have been achieved by Neural Machine Transla...
research
10/27/2016

Can Active Memory Replace Attention?

Several mechanisms to focus attention of a neural network on selected pa...
research
10/19/2020

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

This work presents our ongoing research of unsupervised pretraining in n...

Please sign up or login with your details

Forgot password? Click here to reset