In this work, we investigate the dynamics of stochastic gradient descent...
Attention-based neural networks such as transformers have demonstrated a...
Linear classifiers and leaky ReLU networks trained by gradient flow on t...
In this work, we study the implications of the implicit bias of gradient...
In this work, we provide a characterization of the feature-learning proc...
Benign overfitting, the phenomenon where interpolating models generalize...
We consider a binary classification problem when the data comes from a
m...
Although the optimization objectives for learning neural networks are hi...
We analyze the properties of adversarial training for learning adversari...
We consider a one-hidden-layer leaky ReLU network of arbitrary width tra...
We analyze the properties of gradient descent on convex surrogates for t...
We consider the problem of learning the best-fitting single neuron as
me...
The skip-connections used in residual networks have become a standard
ar...