Generalized Linear Models for Aggregated Data

05/14/2016
by   Avradeep Bhowmik, et al.
0

Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2020

Permutation testing in high-dimensional linear models: an empirical investigation

Permutation testing in linear models, where the number of nuisance coeff...
research
06/21/2019

New methods for multiple testing in permutation inference for the general linear model

Permutation methods are commonly used to test significance of regressors...
research
04/11/2018

Mean and median bias reduction in generalized linear models

This paper presents an integrated framework for estimation and inference...
research
06/23/2022

Regression with Label Permutation in Generalized Linear Model

The assumption that response and predictor belong to the same statistica...
research
07/31/2013

Fast Simultaneous Training of Generalized Linear Models (FaSTGLZ)

We present an efficient algorithm for simultaneously training sparse gen...
research
08/26/2019

Consistently estimating graph statistics using Aggregated Relational Data

Aggregated Relational Data, known as ARD, capture information about a so...
research
07/23/2021

A hierarchical prior for generalized linear models based on predictions for the mean response

There has been increased interest in using prior information in statisti...

Please sign up or login with your details

Forgot password? Click here to reset