Multi-view Banded Spectral Clustering with application to ICD9 clustering

04/06/2018
by   Luwan Zhang, et al.
0

Despite recent development in methodology, community detection remains a challenging problem. Existing literature largely focuses on the standard setting where a network is learned using an observed adjacency matrix from a single data source. Constructing a shared network from multiple data sources is more challenging due to the heterogeneity across populations. Additionally, when a natural ordering on the nodes of interest arises, no existing method takes such information into account. Motivated by grouping the International classification of diseases, ninth revision, (ICD9) codes to represent clinically meaningful phenotypes, we propose a novel spectral clustering method that optimally combines multiple data sources while leveraging the prior ordering knowledge. The proposed method combines a banding step to encourage a desired moving average structure with a subsequent weighting step to maximize consensus across multiple sources. Its statistical performance is thoroughly studied under a multi-view stochastic block model. We also provide a simple rule of choosing weights in practice. The efficacy and robustness of the method is fully demonstrated through extensive simulations. Finally, we apply the method to the ICD9 coding system and yield a very insightful clustering structure by integrating information from a large claim database and two healthcare systems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset