Markov-modulated continuous-time Markov chains to identify site- and branch-specific evolutionary variation

by   Guy Baele, et al.

Markov models of character substitution on phylogenies form the foundation of phylogenetic inference frameworks. Early models made the simplifying assumption that the substitution process is homogeneous over time and across sites in the molecular sequence alignment. While standard practice adopts extensions that accommodate heterogeneity of substitution rates across sites, heterogeneity in the process over time in a site-specific manner remains frequently overlooked. This is problematic, as evolutionary processes that act at the molecular level are highly variable, subjecting different sites to different selective constraints over time, impacting their substitution behaviour. We propose incorporating time variability through Markov-modulated models (MMMs) that allow the substitution process (including relative character exchange rates as well as the overall substitution rate) that models the evolution at an individual site to vary across lineages. We implement a general MMM framework in BEAST, a popular Bayesian phylogenetic inference software package, allowing researchers to compose a wide range of MMMs through flexible XML specification. Using examples from bacterial, viral and plastid genome evolution, we show that MMMs impact phylogenetic tree estimation and can substantially improve model fit compared to standard substitution models. Through simulations, we show that marginal likelihood estimation accurately identifies the generative model and does not systematically prefer the more parameter-rich MMMs. In order to mitigate the increased computational demands associated with MMMs, our implementation exploits recently developed updates to BEAGLE, a high-performance computational library for phylogenetic inference.


page 13

page 15

page 16

page 18

page 25


Gradients do grow on trees: a linear-time O( N )-dimensional gradient for statistical phylogenetics

Calculation of the log-likelihood stands as the computational bottleneck...

EvoVGM: A Deep Variational Generative Model for Evolutionary Parameter Estimation

Most evolutionary-oriented deep generative models do not explicitly cons...

Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

Phylogenetic comparative methods correct for shared evolutionary history...

Beyond time-homogeneity for continuous-time multistate Markov models

Multistate Markov models are a canonical parametric approach for data mo...

Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference

Phylogenetics uses alignments of molecular sequence data to learn about ...

Neural Markov Jump Processes

Markov jump processes are continuous-time stochastic processes with a wi...

Multivariate Functional Data Modeling with Time-varying Clustering

We consider the situation where multivariate functional data has been co...

Please sign up or login with your details

Forgot password? Click here to reset