Linear Optimal Low Rank Projection for High-Dimensional Multi-Class Data

09/05/2017
by   Joshua T. Vogelstein, et al.
0

Classification of individual samples into one or more categories is critical to modern scientific inquiry. Most modern datasets, such as those used in genetic analysis or imaging, include numerous features, such as genes or pixels. Principal Components Analysis (PCA) is now generally used to find low-dimensional representations of such features for further analysis. However, PCA ignores class label information, thereby discarding data that could substantially improve downstream classification performance. We here describe an approach called "Linear Optimal Low-rank"' projection (LOL), which extends PCA by incorporating the class labels. Using theory and synthetic data, we show that LOL leads to a better representation of the data for subsequent classification than PCA while adding negligible computational cost. Experimentally we demonstrate that LOL substantially outperforms PCA in differentiating cancer patients from healthy controls using genetic data and in differentiating gender from magnetic resonance imaging data incorporating >500 million features and 400 gigabytes of data. LOL allows the solution of previous intractable problems yet requires only a few minutes to run on a single desktop computer.

READ FULL TEXT

page 2

page 6

research
04/28/2019

Low-Rank Principal Eigenmatrix Analysis

Sparse PCA is a widely used technique for high-dimensional data analysis...
research
02/26/2023

Efficient fair PCA for fair representation learning

We revisit the problem of fair principal component analysis (PCA), where...
research
10/30/2018

Optimally Weighted PCA for High-Dimensional Heteroscedastic Data

Modern applications increasingly involve high-dimensional and heterogene...
research
01/28/2019

Stochastic Linear Bandits with Hidden Low Rank Structure

High-dimensional representations often have a lower dimensional underlyi...
research
11/15/2022

Solving clustering as ill-posed problem: experiments with K-Means algorithm

In this contribution, the clustering procedure based on K-Means algorith...
research
06/26/2016

Discriminating sample groups with multi-way data

High-dimensional linear classifiers, such as the support vector machine ...

Please sign up or login with your details

Forgot password? Click here to reset