Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization

by   Feihu Huang, et al.

In the paper, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method to solve the non-convex stochastic mini-optimization problems. We prove that the Acc-ZOM method achieves a lower query complexity of O(d^3/4ϵ^-3) for finding an ϵ-stationary point, which improves the best known result by a factor of O(d^1/4) where d denotes the parameter dimension. The Acc-ZOM does not require any batches compared to the large batches required in the existing zeroth-order stochastic algorithms. Further, we extend the Acc-ZOM method to solve the non-convex stochastic minimax-optimization problems and propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method. We prove that the Acc-ZOMDA method reaches the best know query complexity of Õ(κ_y^3(d_1+d_2)^3/2ϵ^-3) for finding an ϵ-stationary point, where d_1 and d_2 denote dimensions of the mini and max optimization parameters respectively and κ_y is condition number. In particular, our theoretical result does not rely on large batches required in the existing methods. Moreover, we propose a momentum-based accelerated framework for the minimax-optimization problems. At the same time, we present an accelerated momentum descent ascent (Acc-MDA) method for solving the white-box minimax problems, and prove that it achieves the best known gradient complexity of Õ(κ_y^3ϵ^-3) without large batches. Extensive experimental results on the black-box adversarial attack to deep neural networks (DNNs) and poisoning attack demonstrate the efficiency of our algorithms.


page 1

page 2

page 3

page 4


Gradient Descent Ascent for Min-Max Problems on Riemannian Manifold

In the paper, we study a class of useful non-convex minimax optimization...

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent asc...

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

We present a method for solving general nonconvex-strongly-convex bileve...

Enhanced Bilevel Optimization via Bregman Distance

Bilevel optimization has been widely applied many machine learning probl...

On the Application of Danskin's Theorem to Derivative-Free Minimax Optimization

Motivated by Danskin's theorem, gradient-based methods have been applied...

Direct Acceleration of SAGA using Sampled Negative Momentum

Variance reduction is a simple and effective technique that accelerates ...

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning

In this paper we introduce MaSS (Momentum-added Stochastic Solver), an a...

Please sign up or login with your details

Forgot password? Click here to reset