Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data

04/14/2016
by   Fan Zhang, et al.
0

The detection of rare variants is important for understanding the genetic heterogeneity in mixed samples. Recently, next-generation sequencing (NGS) technologies have enabled the identification of single nucleotide variants (SNVs) in mixed samples with high resolution. Yet, the noise inherent in the biological processes involved in next-generation sequencing necessitates the use of statistical methods to identify true rare variants. We propose a novel Bayesian statistical model and a variational expectation-maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of low coverage (27× and 298×) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants.

READ FULL TEXT

page 10

page 12

research
02/27/2020

MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference

In this contribution, we propose a new computationally efficient method ...
research
05/01/2019

Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity

Variational Bayes (VB) methods have emerged as a fast and computationall...
research
09/12/2019

Fast expectation-maximization algorithms for spatial generalized linear mixed models

Spatial generalized linear mixed models (SGLMMs) are popular and flexibl...
research
03/22/2018

Frequency violations from random disturbances: an MCMC approach

The frequency stability of power systems is increasingly challenged by v...
research
03/23/2020

Markovian Score Climbing: Variational Inference with KL(p||q)

Modern variational inference (VI) uses stochastic gradients to avoid int...
research
05/22/2017

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

Neural time-series data contain a wide variety of prototypical signal wa...
research
12/30/2022

Topical Hidden Genome: Discovering Latent Cancer Mutational Topics using a Bayesian Multilevel Context-learning Approach

Statistical inference on the cancer-site specificities of collective ult...

Please sign up or login with your details

Forgot password? Click here to reset