Revisiting Classifier Two-Sample Tests

by   David Lopez-Paz, et al.

The goal of two-sample tests is to assess whether two samples, S_P ∼ P^n and S_Q ∼ Q^m, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the n examples in S_P with a positive label, and by pairing the m examples in S_Q with a negative label. If the null hypothesis "P = Q" is true, then the classification accuracy of a binary classifier on a held-out subset of this dataset should remain near chance-level. As we will show, such Classifier Two-Sample Tests (C2ST) learn a suitable representation of the data on the fly, return test statistics in interpretable units, have a simple null distribution, and their predictive uncertainty allow to interpret where P and Q differ. The goal of this paper is to establish the properties, performance, and uses of C2ST. First, we analyze their main theoretical properties. Second, we compare their performance against a variety of state-of-the-art alternatives. Third, we propose their use to evaluate the sample quality of generative models with intractable likelihoods, such as Generative Adversarial Networks (GANs). Fourth, we showcase the novel application of GANs together with C2ST for causal discovery.


page 7

page 8

page 13

page 14


UQGAN: A Unified Model for Uncertainty Quantification of Deep Classifiers trained via Conditional GANs

We present an approach to quantifying both aleatoric and epistemic uncer...

Some multivariate goodness of fit tests based on data depth

Using the fact that some depth functions characterize certain family of ...

A new class of nonparametric tests for second-order stochastic dominance based on the Lorenz P-P plot

Given samples from two non-negative random variables, we propose a new c...

On the Use of Random Forest for Two-Sample Testing

We follow the line of using classifiers for two-sample testing and propo...

A generative adversarial framework for positive-unlabeled classification

In this work, we consider the task of classifying the binary positive-un...

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

Current studies on adversarial robustness mainly focus on aggregating lo...

Exploring the Impact of Password Dataset Distribution on Guessing

Leaks from password datasets are a regular occurrence. An organization m...

Please sign up or login with your details

Forgot password? Click here to reset