On the Convergence of Perturbed Distributed Asynchronous Stochastic Gradient Descent to Second Order Stationary Points in Non-convex Optimization
In this paper, the second order convergence of non-convex optimization in asynchronous stochastic gradient descent (ASGD) algorithm is studied systematically. We investigate the behavior of ASGD near and away from saddle points and show that, different from general stochastic gradient descent(SGD), ASGD may return back after escaping the saddle points, yet after staying near a saddle point for a long enough time (O(T)), ASGD will finally go away from strictly saddle points. An inequality is given to describe the process of ASGD to escape from saddle points. We show the exponential instability of the perturbed gradient dynamics near the strictly saddle points and use a novel Razumikhin-Lyapunov method to give a more detailed estimation about how the time delay parameter T influence the speed to escape. In particular, we consider the optimization of smooth nonconvex functions, and propose a perturbed asynchronous stochastic gradient descent algorithm with guarantee of convergence to second order stationary points with high probability in O(1/ϵ^4) iterations. To the best of our knowledge, this is the first work on the second order convergence of asynchronous algorithm.
READ FULL TEXT