On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

08/06/2019
by   Xiao-Tong Yuan, et al.
4

The DANE algorithm is an approximate Newton method popularly used for communication-efficient distributed machine learning. Reasons for the interest in DANE include scalability and versatility. Convergence of DANE, however, can be tricky; its appealing convergence rate is only rigorous for quadratic objective, and for more general convex functions the known results are no stronger than those of the classic first-order methods. To remedy these drawbacks, we propose in this paper some new alternatives of DANE which are more suitable for analysis. We first introduce a simple variant of DANE equipped with backtracking line search, for which global asymptotic convergence and sharper local non-asymptotic convergence rate guarantees can be proved for both quadratic and non-quadratic strongly convex functions. Then we propose a heavy-ball method to accelerate the convergence of DANE, showing that nearly tight local rate of convergence can be established for strongly convex functions, and with proper modification of algorithm the same result applies globally to linear prediction models. Numerical evidence is provided to confirm the theoretical and practical advantages of our methods.

READ FULL TEXT
research
03/30/2020

Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

In this paper, we study the non-asymptotic superlinear convergence rate ...
research
12/10/2020

DONE: Distributed Newton-type Method for Federated Edge Learning

There is growing interest in applying distributed machine learning to ed...
research
01/01/2021

On a Faster R-Linear Convergence Rate of the Barzilai-Borwein Method

The Barzilai-Borwein (BB) method has demonstrated great empirical succes...
research
07/26/2019

A simple Newton method for local nonsmooth optimization

Superlinear convergence has been an elusive goal for black-box nonsmooth...
research
09/24/2022

Communication-Efficient Federated Learning Using Censored Heavy Ball Descent

Distributed machine learning enables scalability and computational offlo...
research
04/03/2013

A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training

Recently, there has been a renewed interest in the machine learning comm...
research
09/14/2020

Distributed Mirror Descent with Integral Feedback: Asymptotic Convergence Analysis of Continuous-time Dynamics

This work addresses distributed optimization, where a network of agents ...

Please sign up or login with your details

Forgot password? Click here to reset