A nonparametric Bayesian approach to simultaneous subject and cell heterogeneity discovery for single cell RNA-seq data
The advent of the single cell sequencing era opens new avenues for the personalized treatment. The first but important step is to discover the subject heterogeneity at the single cell resolution. In this article, we address the two-level-clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) based on the scRNA-seq data from multiple subjects. However, the current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects not using the single-cell information. To overcome the challenges and fill the gap between cell clustering and subject grouping, we develop a solid nonparametric Bayesian model SCSC (Subject and Cell clustering for Single-Cell expression data) to achieve subject and cell grouping at the same time. SCSC does not need to prespecify the subject subgroup number or the cell type number, automatically induces subject subgroup structures and matches cell types across subjects, and directly models the scRNA-seq raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A computationally efficient blocked Gibbs sampler is proposed for the posterior inference. The simulation and the application to a multi-subject iPSC scRNA-seq dataset validate the function of SCSC to discover subject and cell heterogeneity.
READ FULL TEXT