On convergence rate of stochastic proximal point algorithm without strong convexity, smoothness or bounded gradients

01/22/2019

∙

Significant parts of the recent learning literature on stochastic optimization algorithms focused on the theoretical and practical behaviour of stochastic first order schemes under different convexity properties. Due to its simplicity, the traditional method of choice for most supervised machine learning problems is the stochastic gradient descent (SGD) method. Many iteration improvements and accelerations have been added to the pure SGD in order to boost its convergence in various (strong) convexity setting. However, the Lipschitz gradient continuity or bounded gradients assumptions are an essential requirement for most existing stochastic first-order schemes. In this paper novel convergence results are presented for the stochastic proximal point algorithm in different settings. In particular, without any strong convexity, smoothness or bounded gradients assumptions, we show that a slightly modified quadratic growth assumption is sufficient to guarantee for the stochastic proximal point O(1/k) convergence rate, in terms of the distance to the optimal set. Furthermore, linear convergence is obtained for interpolation setting, when the optimal set of expected cost is included in the optimal sets of each functional component.

READ FULL TEXT

On convergence rate of stochastic proximal point algorithm without strong convexity, smoothness or bounded gradients

Sign in with Google

Consider DeepAI Pro