DOT: Gene-set analysis by combining decorrelated association statistics

by   Olga A. Vsevolozhskaya, et al.

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics for a joint analysis of variants in a group particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer breast cancer risk.


page 1

page 2

page 3

page 4


Set-Based Tests for Genetic Association Using the Generalized Berk-Jones Statistic

Studying the effects of groups of Single Nucleotide Polymorphisms (SNPs)...

Integrated Quantile RAnk Test (iQRAT) for gene-level associations in sequencing studies

Testing gene-based associations is the fundamental approach to identify ...

Robust Identification of Target Genes and Outliers in Triple-negative Breast Cancer Data

Correct classification of breast cancer sub-types is of high importance ...

A Bayes Factor Approach with Informative Prior for Rare Genetic Variant Analysis from Next Generation Sequencing Data

The discovery of rare genetic variants through Next Generation Sequencin...

Deep neural network improves the estimation of polygenic risk scores for breast cancer

Polygenic risk scores (PRS) estimate the genetic risk of an individual f...

Binary and Re-search Signal Region Detection in High Dimensions

Signal region detection is one of the challenging problems in modern sta...

BRCA Gene Mutations in dbSNP: A Visual Exploration of Genetic Variants

BRCA genes, comprising BRCA1 and BRCA2 play indispensable roles in prese...

Please sign up or login with your details

Forgot password? Click here to reset