A Fast, Accurate Two-Step Linear Mixed Model for Genetic Analysis Applied to Repeat MRI Measurements

by   Qifan Yang, et al.

Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to properly handle relatedness, population structure and other types of biases introduced by confounders. Most popular and accurate approaches rely on linear mixed model (LMM) algorithms, which are iterative and computational complexity of each iteration scales by the square of the sample size, slowing the pace of discoveries (up to several days for single trait analysis), and, furthermore, limiting the use of repeat phenotypic measurements. Here, we describe our new, non-iterative, much faster and accurate Two-Step Linear Mixed Model (2sLMM) approach, that has a computational complexity that scales linearly with sample size. We show that the first step retains accurate estimates of the heritability (the proportion of the trait variance explained by additive genetic factors), even when increasingly complex genetic relationships between individuals are modeled. Second step provides a faster framework to obtain the effect sizes of covariates in regression model. We applied Two-Step LMM to real data from the UK Biobank, which recently released genotyping information and processed MRI data from 9,725 individuals. We used the left and right hippocampus volume (HV) as repeated measures, and observed increased and more accurate heritability estimation, consistent with simulations.


page 1

page 2

page 3

page 4


Statistical methods for modeling spatially-referenced paired genetic relatedness data

Understanding factors that contribute to the increased likelihood of dis...

A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

While linear mixed model (LMM) has shown a competitive performance in co...

Haplotype frequency inference from pooled genetic data with a latent multinomial model

In genetic studies, haplotype data provide more refined information than...

A memory-free spatial additive mixed modeling for big spatial data

This study develops a spatial additive mixed modeling (AMM) approach est...

Efficient Reconstruction of Stochastic Pedigrees

We introduce a new algorithm called Rec-Gen for reconstructing the gene...

How to estimate heritability, a guide for epidemiologists

Traditionally, heritability has been estimated using family-based method...

Please sign up or login with your details

Forgot password? Click here to reset