Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

06/16/2018
by   T. Tony Cai, et al.
0

We consider statistical inference for the explained variance β^Σβ under the high-dimensional linear model Y=Xβ+ϵ in the semi-supervised setting, where β is the regression vector and Σ is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semi-supervised framework. The optimality result characterizes how the unlabelled data affects the minimax optimal rate. Moreover, the limiting distribution for the proposed estimator is established and data-driven confidence intervals for the explained variance are constructed. We further develop a randomized calibration technique for statistical inference in the presence of weak signals and apply the obtained inference results to a range of important statistical problems, including signal detection and global testing, prediction accuracy evaluation, and confidence ball construction. The numerical performance of the proposed methodology is demonstrated in simulation studies and an analysis of estimating heritability for a yeast segregant data set with multiple traits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

Semi-Supervised Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

Blockwise missing data occurs frequently when we integrate multisource o...
research
01/18/2022

Statistical Inference on Explained Variation in High-dimensional Linear Model with Dense Effects

Statistical inference on the explained variation of an outcome by a set ...
research
06/17/2023

Distributed Semi-Supervised Sparse Statistical Inference

This paper is devoted to studying the semi-supervised sparse statistical...
research
01/11/2022

Estimation and Inference with Proxy Data and its Genetic Applications

Existing high-dimensional statistical methods are largely established fo...
research
06/28/2019

Large-scale inference with block structure

The detection of weak and rare effects in large amounts of data arises i...
research
04/29/2019

Minimax semi-supervised confidence sets for multi-class classification

In this work we study the semi-supervised framework of confidence set cl...
research
02/02/2019

High-dimensional semi-supervised learning: in search for optimal inference of the mean

We provide a high-dimensional semi-supervised inference framework focuse...

Please sign up or login with your details

Forgot password? Click here to reset