Information Content of a Phylogenetic Tree in a Data Matrix

01/26/2018
by   Tania Roy, et al.
0

Phylogenetic trees in genetics and biology in general are all binary. We make an attempt to answer one fundamental question: Is such binary branching from the coarsest to the finest scales sustained by data? We convert this question into an equivalent one: where is the structural information of tree in a data matrix? Results from this conceptual as well as computing issue afford us to conclude a negative answer: Each branch being split into two at each inter-node of tree from the top to bottom levels is a man-made structure. The data-driven computing paradigm Data Mechanics is employed here to reveal that information of tree is composed of a set of selected temperatures (or scales), each of which has a clustering composition strictly regulated by a temperature-specific cluster-sharing probability matrix. The resultant Data Cloud Geometry (DCG) tree on the space of species is proposed as the authentic structure contained in data. Particularly each core clusters on the finest scale, the bottom level, of DCG tree should not be further partitioned because of uniformity. Beyond the finest scale, the branching of DCG tree is primarily based on probability, which induces an Ultrametric satisfying super triangular inequality property. This Ultrametric property differentiates DCG tree from all popular trees based on Hierarchical clustering (HC) algorithm, which typically employs an empirical, often ad hoc distance measure. Since this measure is regulated by the triangular inequality, it is not capable of producing a "flat" branch, in which all its members (more than two) have equal distances to each others. We demonstrate such information content on an illustrative zoo data first, and then on two genomic data.

READ FULL TEXT

page 3

page 5

page 7

page 8

page 10

page 13

page 15

research
04/06/2017

An Online Hierarchical Algorithm for Extreme Clustering

Many modern clustering methods scale well to a large number of data item...
research
02/13/2020

Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE

t-SNE and hierarchical clustering are popular methods of exploratory dat...
research
04/27/2019

About Fibonacci trees. I

In this first paper, we look at the following question: are the properti...
research
03/03/2023

Contrastive Hierarchical Clustering

Deep clustering has been dominated by flat models, which split a dataset...
research
05/24/2023

Hierarchical clustering with dot products recovers hidden tree structure

In this paper we offer a new perspective on the well established agglome...
research
07/20/2015

Clustering Tree-structured Data on Manifold

Tree-structured data usually contain both topological and geometrical in...
research
07/29/2020

Extreme-K categorical samples problem

With histograms as its foundation, we develop Categorical Exploratory Da...

Please sign up or login with your details

Forgot password? Click here to reset