Graph-Based Two-Sample Tests for Discrete Data

11/12/2017
by   Jingru Zhang, et al.
0

In the regime of two-sample comparison, tests based on a graph constructed on observations by utilizing similarity information among them is gaining attention due to their flexibility and good performances under various settings for high-dimensional data and non-Euclidean data. However, when there are repeated observations or ties in terms of the similarity graph, these graph-based tests could be problematic as they are versatile to the choice of the similarity graph. We study two ways to fix the "tie" problem for the existing graph-based test statistics and a new max-type statistic. Analytic p-value approximations for these extended graph-based tests are also derived and shown to work well for finite samples, allowing the tests to be fast applicable to large datasets. The new tests are illustrated in the analysis of a phone-call network dataset. All proposed tests are implemented in R package gTests.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset