The UU-test for Statistical Modeling of Unimodal Data

08/28/2020
by   Paraskevi Chasani, et al.
5

Deciding on the unimodality of a dataset is an important problem in data analysis and statistical modeling. It allows to obtain knowledge about the structure of the dataset, ie. whether data points have been generated by a probability distribution with a single or more than one peaks. Such knowledge is very useful for several data analysis problems, such as for deciding on the number of clusters and determining unimodal projections. We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset. The method operates on the empirical cumulative density function (ecdf) of the dataset. It attempts to build a piecewise linear approximation of the ecdf that is unimodal and models the data sufficiently in the sense that the data corresponding to each linear segment follows the uniform distribution. A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model. We present experimental results in order to assess the ability of the method to decide on unimodality and perform comparisons with the well-known dip-test approach. In addition, in the case of unimodal datasets we evaluate the Uniform Mixture Models provided by the proposed method using the test set log-likelihood and the two-sample Kolmogorov-Smirnov (KS) test.

READ FULL TEXT

page 3

page 9

page 10

page 11

page 12

page 17

research
05/31/2021

Sparse ANOVA Inspired Mixture Models

Based on the analysis of variance (ANOVA) decomposition of functions whi...
research
02/16/2020

Statistical Simulator for the Engine Knock

This paper proposes a statistical simulator for the engine knock based o...
research
04/17/2020

A Mean Field Games model for finite mixtures of Bernoulli distributions

Finite mixture models are an important tool in the statistical analysis ...
research
10/05/2020

Estimating conditional density of missing values using deep Gaussian mixture model

We consider the problem of estimating the conditional probability distri...
research
06/12/2019

Coresets for Gaussian Mixture Models of Any Shape

An ε-coreset for a given set D of n points, is usually a small weighted ...
research
01/28/2019

Out-of-Sample Testing for GANs

We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies...
research
09/12/2023

G-Mapper: Learning a Cover in the Mapper Construction

The Mapper algorithm is a visualization technique in topological data an...

Please sign up or login with your details

Forgot password? Click here to reset