Training algorithms, broadly construed, are an essential part of every d...
Very little is known about the training dynamics of adaptive gradient me...
We study the implicit bias of gradient flow (i.e., gradient descent with...
Coherent Gradients is a recently proposed hypothesis to explain why
over...
Batch Normalization (BN) is a highly successful and widely used batch
de...
To understand the dynamics of optimization in deep neural networks, we
d...
Progress in deep learning is slowed by the days or weeks it takes to tra...
In recent years, data streaming has gained prominence due to advances in...