Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T) for strongly convex functions, instead of O(κ ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T).
READ FULL TEXT