Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample

02/01/2022
by   Sigrun May, et al.
6

Applying tree-based embedded feature selection to exclude irrelevant features in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. In addition, nested cross-validation must be applied for this type of data to avoid biased model performance. The resulting long computation time can be accelerated with pruning. However, standard pruning algorithms must prune late or risk aborting calculations of promising hyperparameter sets due to high variance in the performance evaluation metric. To address this, we adapt the usage of a state-of-the-art successive halving pruner and combine it with two new pruning strategies based on domain or prior knowledge. One additional pruning strategy immediately stops the computation of trials with semantically meaningless results for the selected hyperparameter combinations. The other is an extrapolating threshold pruning strategy suitable for nested-cross-validation with high variance. Our proposed combined three-layer pruner keeps promising trials while reducing the number of models to be built by up to 81,3 to using a state-of-the-art asynchronous successive halving pruner alone. Our three-layer pruner implementation(available at https://github.com/sigrun-may/cv-pruner) speeds up data analysis or enables deeper hyperparameter search within the same computation time. It consequently saves time, money and energy, reducing the CO2 footprint.

READ FULL TEXT

page 9

page 13

page 14

page 19

page 20

page 21

research
09/25/2018

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular ...
research
12/28/2017

Accurate Bayesian Data Classification without Hyperparameter Cross-validation

We extend the standard Bayesian multivariate Gaussian generative data cl...
research
01/29/2018

Fast Penalized Regression and Cross Validation for Tall Data with the oem Package

A large body of research has focused on theory and computation for varia...
research
03/27/2018

Cross-validation in high-dimensional spaces: a lifeline for least-squares models and multi-class LDA

Least-squares models such as linear regression and Linear Discriminant A...
research
10/19/2022

Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning

Subgraph similarity search, one of the core problems in graph search, co...
research
12/07/2021

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

Sparse shrunk additive models and sparse random feature models have been...
research
08/31/2023

Optimized Deep Feature Selection for Pneumonia Detection: A Novel RegNet and XOR-Based PSO Approach

Pneumonia remains a significant cause of child mortality, particularly i...

Please sign up or login with your details

Forgot password? Click here to reset