Uniform-in-Phase-Space Data Selection with Iterative Normalizing Flows

by   Malik Hassanaly, et al.

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large datasets may hinder scientific discovery, making data reduction a critical component of data-driven methods. Datasets are growing in two directions: the number of data points and their dimensionality. Whereas data compression techniques are concerned with reducing dimensionality, the focus here is on reducing the number of data points. A strategy is proposed to select data points such that they uniformly span the phase-space of the data. The algorithm proposed relies on estimating the probability map of the data and using it to construct an acceptance probability. An iterative method is used to accurately estimate the probability of the rare data points when only a small subset of the dataset is used to construct the probability map. Instead of binning the phase-space to estimate the probability map, its functional form is approximated with a normalizing flow. Therefore, the method naturally extends to high-dimensional datasets. The proposed framework is demonstrated as a viable pathway to enable data-efficient machine learning when abundant data is available. An implementation of the method is available in a companion repository (https://github.com/NREL/Phase-space-sampling).


page 1

page 8

page 11


MinMaxLTTB: Leveraging MinMax-Preselection to Scale LTTB

Visualization plays an important role in analyzing and exploring time se...

Gravitational Dimensionality Reduction Using Newtonian Gravity and Einstein's General Relativity

Due to the effectiveness of using machine learning in physics, it has be...

Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvement

Distance metric learning can be viewed as one of the fundamental interes...

Space-Filling Subset Selection for an Electric Battery Model

Dynamic models of the battery performance are an essential tool througho...

Deep Learning for Causal Inference

In this paper, we propose deep learning techniques for econometrics, spe...

Random Projection and Its Applications

Random Projection is a foundational research topic that connects a bunch...

Iterative Nadaraya-Watson Distribution Transfer for Colour Grading

We propose a new method with Nadaraya-Watson that maps one N-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset