Learning Embeddings from Cancer Mutation Sets for Classification Tasks
Analysis of somatic mutation profiles from cancer patients is essential in the development of cancer research. However, the low frequency of most mutations and the varying rates of mutations across patients makes the data extremely challenging to statistically analyze as well as difficult to use in classification problems, for clustering, visualization or for learning useful information. Thus, the creation of low dimensional representations of somatic mutation profiles that hold useful information about the DNA of cancer cells will facilitate the use of such data in applications that will progress precision medicine. In this paper, we talk about the open problem of learning from somatic mutations, and present Flatsomatic: a solution that utilizes variational autoencoders (VAEs) to create latent representations of somatic profiles. The work done in this paper shows great potential for this method, with the VAE embeddings performing better than PCA for a clustering task, and performing equally well to the raw high dimensional data for a classification task. We believe the methods presented herein can be of great value in future research and in bringing data-driven models into precision oncology.
READ FULL TEXT