Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles

07/06/2023
by   Benjamin S. Ruben, et al.
0

Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.

READ FULL TEXT
research
10/10/2019

The Implicit Regularization of Ordinary Least Squares Ensembles

Ensemble methods that average over a collection of independent predictor...
research
03/02/2023

High-dimensional analysis of double descent for linear regression with random projections

We consider linear regression problems with a varying number of random p...
research
04/25/2023

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

We study subsampling-based ridge ensembles in the proportional asymptoti...
research
03/10/2022

Deep Regression Ensembles

We introduce a methodology for designing and training deep neural networ...
research
06/11/2020

Asymptotics of Ridge(less) Regression under General Source Condition

We analyze the prediction performance of ridge and ridgeless regression ...
research
10/20/2022

Bagging in overparameterized learning: Risk characterization and risk monotonization

Bagging is a commonly used ensemble technique in statistics and machine ...
research
06/29/2015

Portfolio optimization using local linear regression ensembles in RapidMiner

In this paper we implement a Local Linear Regression Ensemble Committee ...

Please sign up or login with your details

Forgot password? Click here to reset