We study the operator norm discrepancy of i.i.d. random matrices, initia...
Fine-tuning pre-trained language models for multiple tasks tends to be
e...
Recent progress was made in characterizing the generalization error of
g...
Few-shot relation extraction aims to learn to identify the relation betw...
Determining whether saddle points exist or are approximable for
nonconve...
Viewing optimization methods as numerical integrators for ordinary
diffe...
We study the mixing properties for stochastic accelerated gradient desce...