Ultrafast learning of 4-node hybridization cycles in phylogenetic networks using algebraic invariants

by   Zhaoxing Wu, et al.

The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process, as this process cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of 4-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Our novel inference methodology is optimization-free as it only requires evaluation of polynomial equations, and as such, it bypasses the traversal of network space yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate the accuracy and speed of our new method on a variety of simulated scenarios as well as in the estimation of a phylogenetic network for the genus Canis. We implement our novel theory on an open-source publicly available Julia package phylo-diamond.jl with broad applicability within the evolutionary biology community.


page 8

page 9

page 10

page 11


Approximate Search for Known Gene Clusters in New Genomes Using PQ-Trees

We define a new problem in comparative genomics, denoted PQ-Tree Search,...

Numerical Implicitization for Macaulay2

We present the Macaulay2 package NumericalImplicitization, which allows ...

Algebraic Invariants for Linear Hybrid Automata

We exhibit an algorithm to compute the strongest algebraic (or polynomia...

NANUQ: A method for inferring species networks from gene trees under the coalescent model

Species networks generalize the notion of species trees to allow for hyb...

Boosting Isomorphic Model Filtering with Invariants

The enumeration of finite models is very important to the working discre...

Geometry of Linear Convolutional Networks

We study the family of functions that are represented by a linear convol...

Space as an invention of biological organisms

The question of the nature of space around us has occupied thinkers sinc...

Please sign up or login with your details

Forgot password? Click here to reset