Fast cross-validation for multi-penalty ridge regression

05/19/2020
by   Mark A. van de Wiel, et al.
0

Prediction based on multiple high-dimensional data types needs to account for the potentially strong differences in predictive signal. Ridge regression is a simple, yet versatile and interpretable model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, in particular in dense settings. Moreover, it allows using a specific penalty per data type to account for differences between those. Then, the largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional loop for fitting the model by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in the low-dimensional sample space. We show that our approach is several orders of magnitude faster than more naive ones. We developed a very flexible framework that includes prediction of several types of response, allows for unpenalized covariates, can optimize several performance criteria and implements repeated CV. Moreover, extensions to pair data types and to allow a preferential order of data types are included and illustrated on several cancer genomics survival prediction problems. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners.

READ FULL TEXT
research
02/07/2019

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

For high-dimensional linear regression models, we review and compare sev...
research
11/09/2019

Influence of single observations on the choice of the penalty parameter in ridge regression

Penalized regression methods, such as ridge regression, heavily rely on ...
research
10/06/2018

Cross validating extensions of kernel, sparse or regular partial least squares regression models to censored data

When cross-validating standard or extended Cox models, the commonly used...
research
07/19/2021

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Models like LASSO and ridge regression are extensively used in practice ...
research
10/29/2020

Group-regularized ridge regression via empirical Bayes noise level cross-validation

Features in predictive models are not exchangeable, yet common supervise...
research
12/23/2021

Cooperative learning for multi-view analysis

We propose a new method for supervised learning with multiple sets of fe...
research
05/08/2020

Flexible co-data learning for high-dimensional prediction

Clinical research often focuses on complex traits in which many variable...

Please sign up or login with your details

Forgot password? Click here to reset