A structure theorem for tree-based phylogenetic networks
Attempting to recognize a tree inside a phylogenetic network is a fundamental undertaking in evolutionary analysis. Therefore, the concept of "tree-based" phylogenetic networks, which was introduced by Francis and Steel, has attracted much attention of theoretical biologists in the last few years. In this context, spanning trees of a certain kind called "subdivision trees" play an essential role and there are many important computational problems about them, whose time complexity is still unclear. Against this backdrop, the present paper aims to provide a graph theoretical framework for solving different problems on subdivision trees in a simple and unified manner. To this end, we focus on a structure called the maximal zig-zag trail decomposition that is inherent in any rooted binary phylogenetic network N and prove a structure theorem that characterizes the collection of all subdivision trees of N. Our theorem does not only imply and unify various results in the literature but also yield linear time (for enumeration, linear delay) algorithms for the following problems: given a rooted binary phylogenetic network N, 1) determine whether or not N has a subdivision tree and find one if there exists any (decision/search problem); 2) compute the number of subdivision trees of N (counting problem); 3) list all subdivision trees of N (enumeration problem); and 4) find a subdivision tree to maximize or minimize a prescribed objective function (optimization problem). Importantly, the results and algorithms in this paper still hold true for some non-binary phylogenetic networks and this generalization gives a partial answer to an open question from Pons, Semple, and Steel. We also mention some statistical applications and further research directions.
READ FULL TEXT