Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

11/01/2018
by   Keren Li, et al.
0

We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR). As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

A Random Sample Partition Data Model for Big Data Analysis

Big data sets must be carefully partitioned into statistically similar d...
research
02/04/2022

Model Averaging for Generalized Linear Models in Fragmentary Data Prediction

Fragmentary data is becoming more and more popular in many areas which b...
research
11/05/2018

Mixture of generalized linear models: identifiability and applications

We consider finite mixtures of generalized linear models with binary out...
research
02/22/2021

You Only Compress Once: Optimal Data Compression for Estimating Linear Models

Linear models are used in online decision making, such as in machine lea...
research
09/28/2022

Inference in generalized linear models with robustness to misspecified variances

Generalized linear models usually assume a common dispersion parameter. ...
research
01/14/2022

k-parametric Dynamic Generalized Linear Models: a sequential approach via Information Geometry

Dynamic generalized linear models may be seen simultaneously as an exten...
research
07/29/2022

A model robust sub-sampling approach for Generalised Linear Models in Big data settings

In today's modern era of Big data, computationally efficient and scalabl...

Please sign up or login with your details

Forgot password? Click here to reset