LowCon: A design-based subsampling approach in a misspecified linear modeL

by   Cheng Meng, et al.

We consider a measurement constrained supervised learning problem, that is, (1) full sample of the predictors are given; (2) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor observations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However, model fitting is a trial and error process, and a postulated model for the data could be misspecified. Our empirical studies demonstrate that most of the existing subsampling methods have unsatisfactory performances when the models are misspecified. In this paper, we develop a novel subsampling method, called "LowCon", which outperforms the competing methods when the working linear model is misspecified. Our method uses orthogonal Latin hypercube designs to achieve a robust estimation. We show that the proposed design-based estimator approximately minimizes the so-called "worst-case" bias with respect to many possible misspecification terms. Both the simulated and real-data analyses demonstrate the proposed estimator is more robust than several subsample least squares estimators obtained by state-of-the-art subsampling methods.


page 1

page 2

page 3

page 4


Joint Likelihood-based Principal Components Regression

We propose a method for estimating principal components regressions by m...

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

In experimental design, we are given a large collection of vectors, each...

On estimation of the effect lag of predictors and prediction in functional linear model

We propose a functional linear model to predict a response using multipl...

Feature screening for multi-response linear models by empirical likelihood

This paper proposes a new feature screening method for the multi-respons...

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

We consider the problem of parameter estimation from observations given ...

Empirical Wavelet-based Estimation for Non-linear Additive Regression Models

Additive regression models are actively researched in the statistical fi...

Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression

Overfitting in linear regression is broken down into two main causes. Fi...

Please sign up or login with your details

Forgot password? Click here to reset