In this note (work in progress towards a full-length paper) we show that...
In contrast to the natural capabilities of humans to learn new tasks in ...
Recurrent Neural Networks (RNNs) offer fast inference on long sequences ...
We study the SAM (Sharpness-Aware Minimization) optimizer which has rece...
Studying the properties of stochastic noise to optimize complex non-conv...
Injecting noise within gradient descent has several desirable features. ...
Transformers have achieved remarkable success in several domains, rangin...
Injecting artificial noise into gradient descent (GD) is commonly employ...
Time series analysis is a widespread task in Natural Sciences, Social
Sc...
Gradient descent ascent (GDA), the simplest single-loop algorithm for
no...
We study the theoretical convergence properties of random-search methods...
Viewing optimization methods as numerical integrators for ordinary
diffe...
In this paper, we investigate the principle that `good explanations are ...
Derivative-free optimization (DFO) has recently gained a lot of momentum...
Ordinary differential equation (ODE) models of gradient-based optimizati...
The choice of how to retain information about past gradients dramaticall...
We propose a new continuous-time formulation for first-order stochastic
...