Communication-Efficient Federated Learning Using Censored Heavy Ball Descent

by   Yicheng Chen, et al.

Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.


page 1

page 4


Distributed Learning With Sparsified Gradient Differences

A very large number of communications are typically required to solve di...

Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning

Federated Learning is a powerful machine learning paradigm to cooperativ...

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

The present paper develops a novel aggregated gradient approach for dist...

On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

The DANE algorithm is an approximate Newton method popularly used for co...

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

This paper presents a new class of gradient methods for distributed mach...

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

We consider a distributed learning problem in a wireless network, consis...

Aggregation in the Mirror Space (AIMS): Fast, Accurate Distributed Machine Learning in Military Settings

Distributed machine learning (DML) can be an important capability for mo...

Please sign up or login with your details

Forgot password? Click here to reset