Inference of Polygenic Factors Associated with Breast Cancer Gene Interaction Networks from Discrete Data Utilizing Poisson Multivariate Mutual Information

02/06/2020
by   Jeremie Fish, et al.
0

In this work we introduce a new methodology to infer from gene expression data the complex interactions associated with polygenetic diseases that remain a major frontier in understanding factors in human health. In many cases disease may be related to the covariance of several genes, rather than simply the variance of a single gene, making network inference crucial to the development of potential treatments. Specifically we investigate the network of factors and associations involved in developing breast cancer from gene expression data. Our approach is information theoretic, but a major obstacle has been the discrete nature of such data that is well described as a multi-variate Poisson process. In fact despite that mutual information is generally a well regarded approach for developing networks of association in data science of complex systems across many disciplines, until now a good method to accurately and efficiently compute entropies from such processes as been lacking. Nonparameteric methods such as the popular k-nearest neighbors (KNN) methods are slow converging and thus require unrealistic amounts of data. We will use the causation entropy (CSE) principle, together with the associated greedy search algorithm optimal CSE (oCSE) as a network inference method to deduce the actual structure, with our multi-variate Poisson estimator developed here as the core computational engine. We show that the Poisson version of oCSE outperforms both the Kraskov-Stögbauer-Grassberger (KSG) oCSE method (which is a KNN method for estimating the entropy) and the Gaussian oCSE method on synthetic data. We present the results for a breast cancer gene expression data set.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset