Lightweight Data Fusion with Conjugate Mappings

by   Christopher L. Dean, et al.

We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks. The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information: primary data, which are well-characterized but with limited availability, and auxiliary data, readily available but lacking a well-characterized statistical relationship to the latent quantity of interest. The lack of a forward model for the auxiliary data precludes the use of standard data fusion approaches, while the inability to acquire latent variable observations severely limits direct application of most supervised learning methods. LDF addresses these issues by utilizing neural networks as conjugate mappings of the auxiliary data: nonlinear transformations into sufficient statistics with respect to the latent variables. This facilitates efficient inference by preserving the conjugacy properties of the primary data and leads to compact representations of the latent variable posterior distributions. We demonstrate the LDF methodology on two challenging inference problems: (1) learning electrification rates in Rwanda from satellite imagery, high-level grid infrastructure, and other sources; and (2) inferring county-level homicide rates in the USA by integrating socio-economic data using a mixture model of multiple conjugate mappings.


page 2

page 33

page 42


Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning

Optimization problems with an auxiliary latent variable structure in add...

RevUp: Revise and Update Information Bottleneck for Event Representation

In machine learning, latent variables play a key role to capture the und...

Bayesian Non-linear Latent Variable Modeling via Random Fourier Features

The Gaussian process latent variable model (GPLVM) is a popular probabil...

Inferring food intake from multiple biomarkers using a latent variable model

Metabolomic based approaches have gained much attention in recent years ...

Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form

We propose a technique for increasing the efficiency of gradient-based i...

GP-select: Accelerating EM using adaptive subspace preselection

We propose a nonparametric procedure to achieve fast inference in genera...

Towards Robust Unsupervised Disentanglement of Sequential Data – A Case Study Using Music Audio

Disentangled sequential autoencoders (DSAEs) represent a class of probab...

Please sign up or login with your details

Forgot password? Click here to reset