D-optimal Subsampling Design for Massive Data Linear Regression

07/05/2023
by   Torsten Reuter, et al.
0

Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider linear regression for an extraordinarily large number of observations, but only a few covariates. Subsampling aims at the selection of a given percentage of the existing original data. Under distributional assumptions on the covariates, we derive D-optimal subsampling designs and study their theoretical properties. We make use of fundamental concepts of optimal design theory and an equivalence theorem from constrained convex optimization. The thus obtained subsampling designs provide simple rules for whether to accept or reject a data point, allowing for an easy algorithmic implementation. In addition, we propose a simplified subsampling method that differs from the D-optimal design but requires lower computing time. We present a simulation study, comparing both subsampling schemes with the IBOSS method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

Optimality regions for designs in multiple linear regression models with correlated random coefficients

This paper studies optimal designs for linear regression models with cor...
research
04/04/2021

D-optimal designs for the Mitscherlich non-linear regression function

Mitscherlich's function is a well-known three-parameter non-linear regre...
research
06/20/2019

Active Linear Regression

We consider the problem of active linear regression where a decision mak...
research
10/29/2017

Distributional Consistency of Lasso by Perturbation Bootstrap

Least Absolute Shrinkage and Selection Operator or the Lasso, introduced...
research
03/27/2019

Asymptotics and Optimal Designs of SLOPE for Sparse Linear Regression

In sparse linear regression, the SLOPE estimator generalizes LASSO by as...
research
03/27/2022

Optimal Design for Estimating the Mean Ability over Time in Repeated Item Response Testing

We present general results on D-optimal designs for estimating the mean ...
research
03/04/2021

A convex approach to optimum design of experiments with correlated observations

Optimal design of experiments for correlated processes is an increasingl...

Please sign up or login with your details

Forgot password? Click here to reset