Smooth densities and generative modeling with unsupervised random forests

05/19/2022
by   David S. Watson, et al.
0

Density estimation is a fundamental problem in statistics, and any attempt to do so in high dimensions typically requires strong assumptions or complex deep learning architectures. An important application for density estimators is synthetic data generation, an area currently dominated by neural networks that often demand enormous training datasets and extensive tuning. We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints, as well as generating realistic synthetic data. We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators, which generally rely on ill-chosen split criteria and do not scale well with data dimensionality. Experiments illustrate that our algorithm compares favorably to state-of-the-art deep learning generative models, achieving superior performance in a range of benchmark trials while executing about two orders of magnitude faster on average. Our method is implemented in easy-to-use and Python packages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2022

Quantum Adaptive Fourier Features for Neural Density Estimation

Density estimation is a fundamental task in statistics and machine learn...
research
01/03/2021

Copula Flows for Synthetic Data Generation

The ability to generate high-fidelity synthetic data is crucial when ava...
research
06/22/2022

Neural Inverse Transform Sampler

Any explicit functional representation f of a density is hampered by two...
research
01/08/2020

Learning Generative Models using Denoising Density Estimators

Learning generative probabilistic models that can estimate the continuou...
research
06/22/2012

Estimating Densities with Non-Parametric Exponential Families

We propose a novel approach for density estimation with exponential fami...
research
08/07/2023

Generative Forests

Tabular data represents one of the most prevalent form of data. When it ...

Please sign up or login with your details

Forgot password? Click here to reset