Interpretable Linear Dimensionality Reduction based on Bias-Variance Analysis

03/26/2023
by   Paolo Bonetti, et al.
0

One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select only the relevant, non-redundant features to preserve the complete information contained in the original dataset, with little collinearity among features and a smaller dimension. This procedure helps mitigate problems like overfitting and the curse of dimensionality, which arise when dealing with high-dimensional problems. On the other hand, it is not desirable to simply discard some features, since they may still contain information that can be exploited to improve results. Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower-dimensional space, possibly considering all the original features. However, the projected features resulting from the application of dimensionality reduction techniques are usually difficult to interpret. In this paper, we seek to design a principled dimensionality reduction approach that maintains the interpretability of the resulting features. Specifically, we propose a bias-variance analysis for linear models and we leverage these theoretical results to design an algorithm, Linear Correlated Features Aggregation (LinCFA), which aggregates groups of continuous features with their average if their correlation is "sufficiently large". In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved. Finally, we provide numerical validations of the proposed algorithm both on synthetic datasets to confirm the theoretical results and on real datasets to show some promising applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2023

Nonlinear Feature Aggregation: Two Algorithms driven by Theory

Many real-world machine learning applications are characterized by a hug...
research
04/29/2022

Local Explanation of Dimensionality Reduction

Dimensionality reduction (DR) is a popular method for preparing and anal...
research
11/04/2021

ExClus: Explainable Clustering on Low-dimensional Data Representations

Dimensionality reduction and clustering techniques are frequently used t...
research
10/18/2021

Dimensionality Reduction for Wasserstein Barycenter

The Wasserstein barycenter is a geometric construct which captures the n...
research
12/22/2021

Nonnegative OPLS for Supervised Design of Filter Banks: Application to Image and Audio Feature Extraction

Audio or visual data analysis tasks usually have to deal with high-dimen...
research
03/12/2014

A survey of dimensionality reduction techniques

Experimental life sciences like biology or chemistry have seen in the re...
research
08/29/2023

Dimensionality Reduction Using pseudo-Boolean polynomials For Cluster Analysis

We introduce usage of a reduction property of penalty-based formulation ...

Please sign up or login with your details

Forgot password? Click here to reset