Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

02/24/2020
by   Ping Ma, et al.
0

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2018

Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method

For optimization on large-scale data, exactly calculating its solution m...
research
02/03/2017

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to dow...
research
07/01/2018

Robust Inference Under Heteroskedasticity via the Hadamard Estimator

Drawing statistical inferences from large datasets in a model-robust way...
research
11/05/2020

Motif Estimation via Subgraph Sampling: The Fourth Moment Phenomenon

Network sampling is an indispensable tool for understanding features of ...
research
07/05/2018

Minimizing Sensitivity to Model Misspecification

We propose a framework to improve the predictions based on an economic m...
research
01/29/2021

Regularizing Double Machine Learning in Partially Linear Endogenous Models

We estimate the linear coefficient in a partially linear model with conf...
research
03/27/2023

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

We propose an adjusted Wasserstein distributionally robust estimator – b...

Please sign up or login with your details

Forgot password? Click here to reset