rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes

by   Muhammad Ammar Malik, et al.

Imaging genetic studies aim to find associations between genetic variants and imaging quantitative traits. Traditional genome-wide association studies (GWAS) are based on univariate statistical tests, but when multiple traits are analyzed together they suffer from a multiple-testing problem and from not taking into account correlations among the traits. An alternative approach to multi-trait GWAS is to reverse the functional relation between genotypes and traits, by fitting a multivariate regression model to predict genotypes from multiple traits simultaneously. However, current reverse genotype prediction approaches are mostly based on linear models. Here, we evaluated random forest regression (RFR) as a method to predict SNPs from imaging QTs and identify biologically relevant associations. We learned machine learning models to predict 518,484 SNPs using 56 brain imaging QTs. We observed that genotype regression error is a better indicator of permutation p-value significance than genotype classification accuracy. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest, but not ridge regression. Moreover, random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders. Feature selection identified well-known brain regions associated with AD,like the hippocampus and amygdala, as important predictors of the most significant SNPs. In summary, our results indicate that non-linear methods like random forests may offer additional insights into phenotype-genotype associations compared to traditional linear multi-variate GWAS methods.


page 2

page 6

page 7

page 12

page 13

page 15


High-dimensional multi-trait GWAS by reverse prediction of genotypes

Multi-trait genome-wide association studies (GWAS) use multi-variate sta...

Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer's disease

For precision medicine and personalized treatment, we need to identify p...

Machine Learning Workflow to Explain Black-box Models for Early Alzheimer's Disease Classification Evaluated for Multiple Datasets

Purpose: Hard-to-interpret Black-box Machine Learning (ML) were often us...

Applying Machine Learning To Maize Traits Prediction

Heterosis is the improved or increased function of any biological qualit...

Ridge-penalized adaptive Mantel test and its application in imaging genetics

We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluati...

Are NBA players getting paid according to their performance on court?

It is customary for researchers and practitioners to fit linear models i...

A Multivariate Regression Approach to Association Analysis of Quantitative Trait Network

Many complex disease syndromes such as asthma consist of a large number ...

Please sign up or login with your details

Forgot password? Click here to reset