DROP: Dimensionality Reduction Optimization for Time Series

by   Sahaana Suri, et al.

Dimensionality reduction is critical in analyzing increasingly high-volume, high-dimensional time series. In this paper, we revisit a now-classic study of time series dimensionality reduction operators and find that for a given quality constraint, Principal Component Analysis (PCA) uncovers representations that are over 2x smaller than those obtained via alternative techniques favored in the literature. However, as classically implemented via Singular Value Decomposition (SVD), PCA is incredibly expensive for large datasets. Therefore, we present DROP, a dimensionality reduction optimizer for high-dimensional analytics pipelines that greatly reduces the cost of the PCA operation over time series datasets. We show that many time series are highly structured, hence a small number of data points are sufficient to characterize the data set, which permits aggressive sampling during dimensionality reduction. This sampling allows DROP to uncover high quality low-dimensional bases in running time proportional to the dataset's intrinsic dimensionality, independent of the actual dataset size, without requiring the user to specify this intrinsic dimensionality a priori. DROP further enables downstream-operation-aware optimization by coupling sampling with online progress estimation, trading-off degree of dimensionality reduction with the combined runtime of DROP and downstream analytics tasks. By progressively sampling its input, computing a candidate basis for transformation, and terminating once it finds a sufficiently high quality basis in a reasonable running time, DROP provides speedups of up to 50x over PCA via SVD and 33x in end-to-end high-dimensional analytics pipelines.


page 4

page 5

page 9

page 10

page 11


Prescriptive PCA: Dimensionality Reduction for Two-stage Stochastic Optimization

In this paper, we consider the alignment between an upstream dimensional...

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

Large volume of Genomics data is produced on daily basis due to the adva...

Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Principal component analysis (PCA) is widely used for feature extraction...

Linear Dimensionality Reduction

These notes are an overview of some classical linear methods in Multivar...

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

High-quality data accumulation is now becoming ubiquitous in the health ...

Please sign up or login with your details

Forgot password? Click here to reset