Optimization of Survey Weights under a Large Number of Conflicting Constraints
In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to increase efficiency by incorporating auxiliary population information, and to reduce the variability in estimates due to extreme weights. It is often the case that no single set of weights can be found which successfully incorporates all of these modifications because together they induce a large number of constraints and restrictions on the feasible solution space. For example, a unique combination of categorical variables may not be present in the sample data, even if the corresponding population level information is available. Additional requirements for weights to fall within specified ranges may also lead to fewer population level adjustments being incorporated. We present a framework and accompanying computational methods to address this issue of constraint achievement or selection within a restricted space that will produce revised weights with reasonable properties. By combining concepts from generalized raking, ridge and lasso regression, benchmarking of small area estimates, augmentation of state-space equations, path algorithms, and data-cloning, this framework simultaneously selects constraints and provides diagnostics suggesting why a fully constrained solution is not possible. Combinatoric operations such as brute force evaluations of all possible combinations of constraints and restrictions are avoided. We demonstrate this framework by applying alternative methods to post-stratification for the National Survey on Drug Use and Health. We also discuss strategies for scaling up to even larger data sets. Computations were performed in R and code is available from the authors.
READ FULL TEXT