research
∙
05/16/2023
Mimetic Initialization of Self-Attention Layers
It is notoriously difficult to train Transformers on small datasets; typ...
research
∙
10/07/2022
Understanding the Covariance Structure of Convolutional Filters
Neural network weights are typically initialized at random from univaria...
research
∙
01/24/2022
Patches Are All You Need?
Although convolutional networks have been the dominant architecture for ...
research
∙
04/14/2021