Generalized Linear Models for Longitudinal Data with Biased Sampling Designs: A Sequential Offsetted Regressions Approach

01/13/2020
by   Lee S. McDaniel, et al.
0

Biased sampling designs can be highly efficient when studying rare (binary) or low variability (continuous) endpoints. We consider longitudinal data settings in which the probability of being sampled depends on a repeatedly measured response through an outcome-related, auxiliary variable. Such auxiliary variable- or outcome-dependent sampling improves observed response and possibly exposure variability over random sampling, even though the auxiliary variable is not of scientific interest. For analysis, we propose a generalized linear model based approach using a sequence of two offsetted regressions. The first estimates the relationship of the auxiliary variable to response and covariate data using an offsetted logistic regression model. The offset hinges on the (assumed) known ratio of sampling probabilities for different values of the auxiliary variable. Results from the auxiliary model are used to estimate observation-specific probabilities of being sampled conditional on the response and covariates, and these probabilities are then used to account for bias in the second, target population model. We provide asymptotic standard errors accounting for uncertainty in the estimation of the auxiliary model, and perform simulation studies demonstrating substantial bias reduction, correct coverage probability, and improved design efficiency over simple random sampling designs. We illustrate the approaches with two examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2018

Adaptive two-stage sequential double sampling

In many surveys inexpensive auxiliary variables are available that can h...
research
11/12/2020

Patient Recruitment Using Electronic Health Records Under Selection Bias: a Two-phase Sampling Framework

Electronic health records (EHRs) are increasingly recognized as a cost-e...
research
08/09/2022

Analysis of Longitudinal Data with Missing Values in the Response and Covariates Using the Stochastic EM Algorithm

In longitudinal data a response variable is measured over time, or under...
research
05/02/2021

Zero-inflated generalized extreme value regression model for binary data and application in health study

Logistic regression model is widely used in many studies to investigate ...
research
02/18/2022

Preferential Sampling for Bivariate Spatial Data

Preferential sampling provides a formal modeling specification to captur...
research
09/21/2023

Causal inference with outcome dependent sampling and mismeasured outcome

Outcome-dependent sampling designs are extensively utilized in various s...
research
09/03/2015

Sequential Design for Ranking Response Surfaces

We propose and analyze sequential design methods for the problem of rank...

Please sign up or login with your details

Forgot password? Click here to reset