Bayesian Approaches for Flexible and Informative Clustering of Microbiome Data

07/31/2020
by   Yushu Shi, et al.
0

We propose two unsupervised clustering methods that are designed for human microbiome data. Existing clustering approaches do not fully address the challenges of microbiome data, which are typically structured as counts with a fixed sum constraint. In addition to accounting for this structure, we recognize that high-dimensional microbiome datasets often contain uninformative features, or "noise" operational taxonomic units (OTUs), that hinder successful clustering. To address this challenge, we select features which are useful in differentiating groups during the clustering process. By taking a Bayesian modeling approach, we are able to learn the number of clusters from the data, rather than fixing it upfront. We first describe a basic version of the model using Dirichlet multinomial distributions as mixture components which does not require any additional information on the OTUs. When phylogenetic or taxonomic information is available, however, we rely on Dirichlet tree multinomial distributions, which capture the tree-based topological structure of microbiome data. We test the performance of our methods through simulation, and illustrate their application first to gut microbiome data of children from different regions of the world, and then to a clinical study exploring differences in the microbiome between long and short term pancreatic cancer survivors. Our results demonstrate that the proposed methods have performance advantages over commonly used unsupervised clustering algorithms and the additional scientific benefit of identifying informative features.

READ FULL TEXT

page 9

page 11

page 12

page 17

page 18

page 20

page 21

page 22

research
08/02/2020

Dirichlet-tree multinomial mixtures for clustering microbiome compositions

A common routine in microbiome research is to identify reproducible patt...
research
03/14/2017

A Random Finite Set Model for Data Clustering

The goal of data clustering is to partition data points into groups to m...
research
04/14/2015

Probabilistic Clustering of Time-Evolving Distance Data

We present a novel probabilistic clustering model for objects that are r...
research
05/13/2019

Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes

This paper focuses on the problem of hierarchical non-overlapping cluste...
research
10/19/2018

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas....
research
03/08/2016

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of bin...
research
03/01/2023

Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

Clustering is commonly performed as an initial analysis step for uncover...

Please sign up or login with your details

Forgot password? Click here to reset