Online Data Thinning via Multi-Subspace Tracking

09/12/2016
by   Xin Jiang, et al.
0

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariances matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. Furthermore, the proposed method allows subsampling, is robust to missing data, and uses a mini-batch online optimization approach. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery and e-mail databases illustrate the efficacy of the proposed approach.

READ FULL TEXT

page 3

page 7

page 20

page 21

page 22

page 24

research
08/01/2013

Learning Robust Subspace Clustering

We propose a low-rank transformation-learning framework to robustify sub...
research
08/13/2021

Random Subspace Mixture Models for Interpretable Anomaly Detection

We present a new subspace-based method to construct probabilistic models...
research
09/01/2015

Online Supervised Subspace Tracking

We present a framework for supervised subspace tracking, when there are ...
research
04/17/2014

Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors

Extracting latent low-dimensional structure from high-dimensional data i...
research
08/24/2012

Changepoint detection for high-dimensional time series with missing data

This paper describes a novel approach to change-point detection when the...
research
08/19/2016

Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking

This paper addresses network anomography, that is, the problem of inferr...
research
07/23/2013

Online Optimization in Dynamic Environments

High-velocity streams of high-dimensional data pose significant "big dat...

Please sign up or login with your details

Forgot password? Click here to reset