The ability to extrapolate from short problem instances to longer ones i...
Language models have achieved remarkable performance on a wide range of ...
We introduce the Block-Recurrent Transformer, which applies a transforme...
The test loss of well-trained neural networks often follows precise powe...
Inspired by human learning, researchers have proposed ordering examples
...
Wide neural networks have proven to be a rich class of architectures for...
Machine learning is predicated on the concept of generalization: a model...
A central challenge in developing versatile machine learning systems is
...
The choice of initial learning rate can have a profound effect on the
pe...
Though data augmentation has become a standard component of deep neural
...
Understanding the asymptotic behavior of wide networks is of considerabl...
We show that in a variety of large-scale deep learning scenarios the gra...