Information geometry for phylogenetic trees
We propose a new space to model phylogenetic trees. It is based on a biologically motivated Markov model for genetic sequence evolution. As a point set, this space comprises the previously developed Billera-Holmes-Vogtmann (BHV) tree space while its geometry is motivated from the edge-product space. As the latter, our new wald space also involves disconnected forests, it does not contain certain singularities of the latter, though. The geometry of wald space is that of the Fisher information metric of character distributions, either from a discrete Bernoulli or from a continuous Gaussian model. The latter can be viewed as the trace metric of the affine-invariant metric for covariance matrices, the former is that of the Hellinger divergence, or, as we show, equivalent to any metric obtained from an f -divergence, such as the Jensen-Shannon metric. For the latter (continuous) we derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space and for both we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motived space is substantially different.
READ FULL TEXT