Randomized Block-Diagonal Preconditioning for Parallel Learning

We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique which corresponds to repartitioning coordinates across tasks during the optimization procedure. We provide a theoretical analysis that accurately characterizes the expected convergence gains of repartitioning and validate our findings empirically on various traditional machine learning tasks. From an implementation perspective, block separable models are well suited for parallelization and, when shared memory is available, randomization can be implemented on top of existing methods very efficiently to improve convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2018

optimParallel: an R Package Providing Parallel Versions of the Gradient-Based Optimization Methods of optim()

The R package optimParallel provides a parallel version of the gradient-...
research
05/26/2019

Stochastic Gradient Methods with Block Diagonal Matrix Adaptation

Adaptive gradient approaches that automatically adjust the learning rate...
research
02/05/2019

A Modular Approach to Block-diagonal Hessian Approximations for Second-order Optimization Methods

We propose a modular extension of the backpropagation algorithm for comp...
research
03/17/2020

Diagonal Preconditioning: Theory and Algorithms

Diagonal preconditioning has been a staple technique in optimization and...
research
09/20/2019

Trivializations for Gradient-Based Optimization on Manifolds

We introduce a framework to study the transformation of problems with ma...
research
09/12/2023

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Shampoo is an online and stochastic optimization algorithm belonging to ...
research
09/22/2014

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

We develop parallel and distributed Frank-Wolfe algorithms; the former o...

Please sign up or login with your details

Forgot password? Click here to reset