Estimating and accounting for unobserved covariates in high dimensional correlated data

08/17/2018
by   Chris McKennan, et al.
0

Many high dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput `omic' data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation and inference on effects due to the known covariate of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high dimensional data with correlated or nonexchangeable residuals. We demonstrate each method's superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study. As far as we are aware, these are the first methods to estimate the number of and correct for latent confounding factors in data with correlated or nonexchangeable residuals. An R-package is available for download at https://github.com/chrismckennan/CorrConf.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

Estimation and inference in metabolomics with non-random missing data and latent factors

High throughput metabolomics data are fraught with both non-ignorable mi...
research
09/23/2020

Factor analysis in high dimensional biological data with dependent observations

Factor analysis is a critical component of high dimensional biological d...
research
01/03/2018

Accounting for unobserved covariates with varying degrees of estimability in high dimensional biological data

An important phenomenon in high dimensional biological data is the prese...
research
01/03/2018

Accounting for unobserved covariates with varying degrees of estimability in high dimensional experimental data

An important phenomenon in high dimensional biological data is the prese...
research
03/11/2022

Optimal Covariate Weighting Increases Discoveries in High-throughput Biology

The large-scale multiple testing inherent to high throughput biological ...
research
12/07/2018

METCC: METric learning for Confounder Control Making distance matter in high dimensional biological analysis

High-dimensional data acquired from biological experiments such as next ...
research
07/14/2017

Toward A Scalable Exploratory Framework for Complex High-Dimensional Phenomics Data

Phenomics is an emerging branch of modern biology, which uses high throu...

Please sign up or login with your details

Forgot password? Click here to reset