Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

by   Paul D. W. Kirk, et al.

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide information on a diverse array of different biomolecular processes and pathways. Different groups of variables (e.g. genes or proteins) will be implicated in different biomolecular processes, and hence undertaking analyses that are limited to identifying just a single clustering partition of the whole dataset is therefore liable to conflate the multiple clustering structures that may arise from these distinct processes. To address this, we propose a multi-view Bayesian mixture model that identifies groups of variables (“views"), each of which defines a distinct clustering structure. We consider applications in stratified medicine, for which our principal goal is to identify clusters of patients that define distinct, clinically actionable disease subtypes. We adopt the semi-supervised, outcome-guided mixture modelling approach of Bayesian profile regression that makes use of a response variable in order to guide inference toward the clusterings that are most relevant in a stratified medicine context. We present the model, together with illustrative simulation examples, and examples from pan-cancer proteomics. We demonstrate how the approach can be used to perform integrative clustering, and consider an example in which different 'omics datasets are integrated in the context of breast cancer subtyping.


page 10

page 13

page 19


Outcome-guided Bayesian Clustering for Disease Subtype Discovery Using High-dimensional Transcriptomic Data

The discovery of disease subtypes is an essential step for developing pr...

Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables

The identification of sets of co-regulated genes that share a common fun...

Supervised clustering of high dimensional data using regularized mixture modeling

Identifying relationships between molecular variations and their clinica...

Outcome-guided Sparse K-means for Disease Subtype Discovery via Integrating Phenotypic Data with High-dimensional Transcriptomic Data

The discovery of disease subtypes is an essential step for developing pr...

Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

High-throughput microarray and sequencing technology have been used to i...

Parea: multi-view ensemble clustering for cancer subtype discovery

Multi-view clustering methods are essential for the stratification of pa...

Bayesian Approaches for Flexible and Informative Clustering of Microbiome Data

We propose two unsupervised clustering methods that are designed for hum...

Please sign up or login with your details

Forgot password? Click here to reset