Multi-sample estimation of centered log-ratio matrix in microbiome studies

06/15/2021
by   Yezheng Li, et al.
0

In microbiome studies, one of the ways of studying bacterial abundances is to estimate bacterial composition based on the sequencing read counts. Various transformations are then applied to such compositional data for downstream statistical analysis, among which the centered log-ratio (clr) transformation is most commonly used. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which makes clr transformation infeasible. This paper proposes a multi-sample approach to estimation of the clr matrix directly in order to borrow information across samples and across species. Empirical results from real datasets suggest that the clr matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient is developed. Theoretical upper bounds of the estimation errors and of its corresponding singular subspace errors are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is analyzed on Gut Microbiome dataset and the American Gut project.

READ FULL TEXT
research
11/28/2018

High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

In microbiome and genomic study, the regression of compositional data ha...
research
04/20/2020

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Microbial communities analysis is drawing growing attention due to the r...
research
10/15/2021

A new class of α-transformations for the spatial analysis of Compositional Data

Georeferenced compositional data are prominent in many scientific fields...
research
01/28/2020

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

This paper is concerned by the analysis of observations organized in a m...
research
05/28/2013

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

We study the adaptive estimation of copula correlation matrix Σ for the ...
research
06/30/2023

Robust minimum divergence estimation in a spatial Poisson point process

Species distribution modeling (SDM) plays a crucial role in investigatin...
research
04/18/2019

Testing for differential abundance in compositional counts data, with application to microbiome studies

In order to identify which taxa differ in the microbiome community acros...

Please sign up or login with your details

Forgot password? Click here to reset