SMSSVD - SubMatrix Selection Singular Value Decomposition

by   Rasmus Henningsson, et al.

High throughput biomedical measurements normally capture multiple overlaid biologically relevant signals and often also signals representing different types of technical artefacts like e.g. batch effects. Signal identification and decomposition are accordingly main objectives in statistical biomedical modeling and data analysis. Existing methods, aimed at signal reconstruction and deconvolution, in general, are either supervised, contain parameters that need to be estimated or present other types of ad hoc features. We here introduce SubMatrix Selection SingularValue Decomposition (SMSSVD), a parameter-free unsupervised signal decomposition and dimension reduction method, designed to reduce noise, adaptively for each low-rank-signal in a given data matrix, and represent the signals in the data in a way that enable unbiased exploratory analysis and reconstruction of multiple overlaid signals, including identifying groups of variables that drive different signals. The Submatrix Selection Singular Value Decomposition (SMSSVD) method produces a denoised signal decomposition from a given data matrix. The SMSSVD method guarantees orthogonality between signal components in a straightforward manner and it is designed to make automation possible. We illustrate SMSSVD by applying it to several real and synthetic datasets and compare its performance to golden standard methods like PCA (Principal Component Analysis) and SPC (Sparse Principal Components, using Lasso constraints). The SMSSVD is computationally efficient and despite being a parameter-free method, in general, outperforms existing statistical learning methods. A Julia implementation of SMSSVD is openly available on GitHub (


page 1

page 2

page 3

page 4


Sifted Randomized Singular Value Decomposition

We extend the randomized singular value decomposition (SVD) algorithm <c...

ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

Various problems in data analysis and statistical genetics call for reco...

Untargeted Region of Interest Selection for GC-MS Data using a Pseudo F-Ratio Moving Window (ψFRMV)

There are many challenges associated with analysing gas chromatography -...

A Generalized Least Squares Matrix Decomposition

Variables in many massive high-dimensional data sets are structured, ari...

Robust Integrative Biclustering for Multi-view Data

In many biomedical research, multiple views of data (e.g., genomics, pro...

Anticancer Peptides Classification using Kernel Sparse Representation Classifier

Cancer is one of the most challenging diseases because of its complexity...

SOFAR: large-scale association network learning

Many modern big data applications feature large scale in both numbers of...

Please sign up or login with your details

Forgot password? Click here to reset