Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference

by   Naomi E. Hannaford, et al.

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is non-stationary, with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a non-reversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its sub-trees could have arisen from the same family of non-stationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which non-reversible but stationary, and non-stationary but reversible models cannot identify a plausible root.


page 1

page 2

page 3

page 4


An Efficient Reversible Algorithm for Linear Regression

This paper presents an efficient reversible algorithm for linear regress...

Efficient Bayesian inference of fully stochastic epidemiological models with applications to COVID-19

Epidemiological forecasts are beset by uncertainties in the generative m...

Irreversible Langevin MCMC on Lie Groups

It is well-known that irreversible MCMC algorithms converge faster to th...

Identifiability of the Rooted Tree Parameter under the Cavender-Farris-Neyman Model with a Molecular Clock

Identifiability of the discrete tree parameter is a key property for phy...

Markov-modulated continuous-time Markov chains to identify site- and branch-specific evolutionary variation

Markov models of character substitution on phylogenies form the foundati...

An Approximate Bayesian Approach to Surprise-Based Learning

Surprise-based learning allows agents to adapt quickly in non-stationary...

Please sign up or login with your details

Forgot password? Click here to reset