SySCD: A System-Aware Parallel Coordinate Descent Algorithm

11/18/2019
by   Nikolas Ioannou, et al.
0

In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm with convergence guarantees that exhibits strong scalability. We start by studying a state-of-the-art parallel implementation of SCD and identify scalability as well as system-level performance bottlenecks of the respective implementation. We then take a principled approach to develop a new SCD variant which is designed to avoid the identified system bottlenecks, such as limited scaling due to coherence traffic of model sharing across threads, and inefficient CPU cache accesses. Our proposed system-aware parallel coordinate descent algorithm (SySCD) scales to many cores and across numa nodes, and offers a consistent bottom line speedup in training time of up to x12 compared to an optimized asynchronous parallel SCD algorithm and up to x42, compared to state-of-the-art GLM solvers (scikit-learn, Vowpal Wabbit, and H2O) on a range of datasets and multi-core CPU architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of train...
research
11/13/2018

Parallel Stochastic Asynchronous Coordinate Descent: Tight Bounds on the Possible Parallelism

Several works have shown linear speedup is achieved by an asynchronous p...
research
06/27/2012

Scaling Up Coordinate Descent Algorithms for Large ℓ_1 Regularization Problems

We present a generic framework for parallel coordinate descent (CD) algo...
research
10/07/2013

Parallel coordinate descent for the Adaboost problem

We design a randomised parallel version of Adaboost based on previous st...
research
08/15/2018

An Analysis of Asynchronous Stochastic Accelerated Coordinate Descent

Gradient descent, and coordinate descent in particular, are core tools i...
research
01/09/2018

Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes

We present a study in Distributed Deep Reinforcement Learning (DDRL) foc...
research
06/03/2023

Optimized Vectorization Implementation of CRYSTALS-Dilithium

CRYSTALS-Dilithium is a lattice-based signature scheme to be standardize...

Please sign up or login with your details

Forgot password? Click here to reset