One Sample Stochastic Frank-Wolfe

10/10/2019
by   Mingrui Zhang, et al.
7

One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit. The aim of this paper is to bring them back without sacrificing the efficiency. In this paper, we propose the first one-sample stochastic Frank-Wolfe algorithm, called 1-SFW, that avoids the need to carefully tune the batch size, step size, learning rate, and other complicated hyper parameters. In particular, 1-SFW achieves the optimal convergence rate of O(1/ϵ^2) for reaching an ϵ-suboptimal solution in the stochastic convex setting, and a (1-1/e)-ϵ approximate solution for a stochastic monotone DR-submodular maximization problem. Moreover, in a general non-convex setting, 1-SFW finds an ϵ-first-order stationary point after at most O(1/ϵ^3) iterations, achieving the current best known convergence rate. All of this is possible by designing a novel unbiased momentum estimator that governs the stability of the optimization process while using a single sample at each iteration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2021

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Stochastic gradient descent with momentum (SGDM) is the dominant algorit...
research
03/23/2017

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

Recently, research on accelerated stochastic gradient descent methods (e...
research
02/21/2022

MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate

The fluctuation effect of gradient expectation and variance caused by pa...
research
05/07/2021

Scalable Projection-Free Optimization

As a projection-free algorithm, Frank-Wolfe (FW) method, also known as c...
research
04/24/2018

Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization

This paper considers stochastic optimization problems for a large class ...
research
11/17/2017

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

Learning representation from relative similarity comparisons, often call...
research
11/15/2020

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

The focus of this paper is on stochastic variational inequalities (VI) u...

Please sign up or login with your details

Forgot password? Click here to reset