The stochastic gradient descent (SGD) algorithm is the algorithm we use ...
We present a simple picture of the training process of self-supervised
l...
A fundamental open problem in deep learning theory is how to define and
...
We identify and prove a general principle: L_1 sparsity can be achieved
...
Prevention of complete and dimensional collapse of representations has
r...
This work reports deep-learning-unique first-order and second-order phas...
This work identifies the existence and cause of a type of posterior coll...
This work finds the exact solutions to a deep linear network with weight...
This work theoretically studies stochastic neural networks, a main type ...
Stochastic gradient descent (SGD) has been deployed to solve highly
non-...
The main task we consider is portfolio construction in a speculative mar...
Stochastic gradient descent (SGD) undergoes complicated multiplicative n...
Adaptive gradient methods have achieved remarkable success in training d...
The noise in stochastic gradient descent (SGD), caused by minibatch samp...
As a simple and efficient optimization method in deep learning, stochast...
The natural world is abundant with concepts expressed via visual, acoust...
It has been hypothesized that label smoothing can reduce overfitting and...
Previous literature offers limited clues on how to learn a periodic func...
We propose a novel regularization method, called volumization, for
neura...
Learning in the presence of label noise is a challenging yet important t...
Identifying a divergence problem in Adam, we propose a new optimizer, La...
Federated learning is an emerging research paradigm to train models on
p...
We deal with the selective classification problem
(supervised-learning p...