Supervised Dimensionality Reduction and Visualization using Centroid-encoder

by   Tomojit Ghosh, et al.
Colorado State University

Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.


page 12

page 13

page 14

page 20


q-SNE: Visualizing Data using q-Gaussian Distributed Stochastic Neighbor Embedding

The dimensionality reduction has been widely introduced to use the high-...

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

High-quality data accumulation is now becoming ubiquitous in the health ...

Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Dimension reduction and visualization of high-dimensional data have beco...

Capacity Preserving Mapping for High-dimensional Data Visualization

We provide a rigorous mathematical treatment to the crowding issue in da...

Joint Characterization of Multiscale Information in High Dimensional Data

High dimensional data can contain multiple scales of variance. Analysis ...

Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization

Multimodal relational data analysis has become of increasing importance ...

Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

Dimensionality reduction and manifold learning methods such as t-Distrib...

Please sign up or login with your details

Forgot password? Click here to reset