Statistical Inference for Model Parameters in Stochastic Gradient Descent

by   Xi Chen, et al.

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing work focuses on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of the true model parameters based on SGD. To this end, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) an intuitive plug-in estimator and (2) a computationally more efficient batch-means estimator, which only uses the iterates from SGD. As the SGD process forms a time-inhomogeneous Markov chain, our batch-means estimator with carefully chosen increasing batch sizes generalizes the classical batch-means estimator designed for time-homogenous Markov chains. The proposed batch-means estimator is of independent interest, which can be potentially used for estimating the covariance of other time-inhomogeneous Markov chains. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. We further discuss an extension to conducting inference based on SGD for high-dimensional linear regression. Using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficient estimator and confidence intervals, which is computationally attractive and applicable to online data.


page 1

page 2

page 3

page 4


Statistical Inference for Model Parameters in Stochastic Gradient Descent via Batch Means

Statistical inference of true model parameters based on stochastic gradi...

A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions

Stochastic gradient descent (SGD) algorithm is widely used for parameter...

Statistical Inference with Stochastic Gradient Methods under φ-mixing Data

Stochastic gradient descent (SGD) is a scalable and memory-efficient opt...

Online covariance estimation for stochastic gradient descent under Markovian sampling

We study the online overlapping batch-means covariance estimator for Sto...

Covariance Estimators for the ROOT-SGD Algorithm in Online Learning

Online learning naturally arises in many statistical and machine learnin...

Statistical inference using SGD

We present a novel method for frequentist statistical inference in M-est...

Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

The implicit stochastic gradient descent (ISGD), a proximal version of S...

Please sign up or login with your details

Forgot password? Click here to reset