Aggregated Momentum: Stability Through Passive Damping

04/01/2018
by   James Lucas, et al.
0

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed in low curvature directions. Its performance depends crucially on a damping coefficient β. Large β values can potentially deliver much larger speedups, but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different β parameters. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive β values such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and provide theoretical convergence bounds for online convex optimization. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro