A Stochastic Proximal Polyak Step Size

01/12/2023
by   Fabian Schaipp, et al.
0

Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum of a loss and a regularizer, available estimates of a lower bound of the sum can be loose. In contrast, ProxSPS only requires a lower bound for the loss which is often readily available. As a consequence, we show that ProxSPS is easier to tune and more stable in the presence of regularization. Furthermore for image classification tasks, ProxSPS performs as well as AdamW with little to no tuning, and results in a network with smaller weight parameters. We also provide an extensive convergence analysis for ProxSPS that includes the non-smooth, smooth, weakly convex and strongly convex setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2018

Adaptive Three Operator Splitting

We propose and analyze a novel adaptive step size variant of the Davis-Y...
research
04/06/2019

Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization

Backtracking line-search is an old yet powerful strategy for finding bet...
research
08/22/2019

A General Analysis Framework of Lower Complexity Bounds for Finite-Sum Optimization

This paper studies the lower bound complexity for the optimization probl...
research
02/09/2015

Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal Reconstruction with a Convex Constraint

We develop a projected Nesterov's proximal-gradient (PNPG) approach for ...
research
07/26/2023

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

Here we develop variants of SGD (stochastic gradient descent) with an ad...
research
09/05/2022

The Proxy Step-size Technique for Regularized Optimization on the Sphere Manifold

We give an effective solution to the regularized optimization problem g ...
research
08/28/2019

Linear Convergence of Adaptive Stochastic Gradient Descent

We prove that the norm version of the adaptive stochastic gradient metho...

Please sign up or login with your details

Forgot password? Click here to reset