Large language models display remarkable capabilities in logical and
mat...
This work investigates the nuanced algorithm design choices for deep lea...
When using Stochastic Gradient Descent (SGD) for training machine learni...
Finetuning a pretrained model has become a standard approach for trainin...
There is mounting empirical evidence of emergent phenomena in the
capabi...
Large neural networks trained in the overparameterized regime are able t...
We study the power of learning via mini-batch stochastic gradient descen...
We study the relative power of learning with gradient descent on
differe...
Several recent works have shown separation results between deep neural
n...
Convolutional neural networks (CNN) exhibit unmatched performance in a
m...
A supervised learning algorithm has access to a distribution of labeled
...
In recent years we see a rapidly growing line of research which shows
le...
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a
...
Training neural-networks is computationally hard. However, in practice t...
Since its inception in the 1980s, ID3 has become one of the most success...
In recent years, there are many attempts to understand popular heuristic...
ReLU neural-networks have been in the focus of many recent theoretical w...
Understanding the power of depth in feed-forward neural networks is an
o...
We describe a layer-by-layer algorithm for training deep convolutional
n...