XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data
Deep learning based approaches have proven promising to model omics data. However, one of the current limitations compared to statistical and traditional machine learning approaches is the lack of explainability, which not only reduces the reliability, but limits the potential for acquiring novel knowledge from unpicking the "black-box" models. Here we present XOmiVAE, a novel interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is able to obtain contribution values of each gene and latent dimension for a specific prediction, and the correlation between genes and the latent dimensions. It is also revealed that XOmiVAE can explain both the supervised classification and the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activated-based deep learning interpretation method to explain novel clusters generated by variational autoencoders. The results generated by XOmiVAE were validated by both the biomedical knowledge and the performance of downstream tasks. XOmiVAE explanations of deep learning based cancer classification and clustering aligned with current domain knowledge including biological annotation and literature, which shows great potential for novel biomedical knowledge discovery from deep learning models. The top XOmiVAE selected genes and dimensions shown significant influence to the performance of cancer classification. Additionally, we offer important steps to consider when interpreting deep learning models for tumour classification. For instance, we demonstrate the importance of choosing background samples that makes biological sense and the limitations of connection weight based methods to explain latent dimensions.
READ FULL TEXT