Leveraging Non-uniformity in First-order Non-convex Optimization

05/13/2021
by   Jincheng Mei, et al.
14

Classical global convergence results for first-order methods rely on uniform smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to Non-uniform Smoothness (NS) and Non-uniform Łojasiewicz inequality (NŁ). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical Ω(1/t^2) lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to O(e^-t) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

page 9

research
06/02/2023

Convex and Non-Convex Optimization under Generalized Smoothness

Classical analysis of convex and non-convex optimization methods often r...
research
12/30/2015

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Accelerated coordinate descent is widely used in optimization due to its...
research
05/13/2020

On the Global Convergence Rates of Softmax Policy Gradient Methods

We make three contributions toward better understanding policy gradient ...
research
04/04/2023

Machine Learning Discovery of Optimal Quadrature Rules for Isogeometric Analysis

We propose the use of machine learning techniques to find optimal quadra...
research
04/16/2015

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

We apply stochastic average gradient (SAG) algorithms for training condi...
research
05/25/2021

SGD with Coordinate Sampling: Theory and Practice

While classical forms of stochastic gradient descent algorithm treat the...
research
04/01/2021

Replicate or Relocate? Non-Uniform Access in Parameter Servers

Parameter servers (PSs) facilitate the implementation of distributed tra...

Please sign up or login with your details

Forgot password? Click here to reset