Interpolating Distributions for Populations in Nested Geographies using Public-use Data with Application to the American Community Survey
Statistical agencies often publish multiple data products from the same survey. First, they produce aggregate estimates of various features of the distributions of several socio-demographic quantities of interest. Often these area-level estimates are tabulated at small geographies. Second, statistical agencies frequently produce weighted public-use microdata samples (PUMS) that provide detailed information of the entire distribution for the same socio-demographic variables. However, the public-use micro areas usually constitute relatively large geographies in order to protect against the identification of households or individuals included in the sample. These two data products represent a trade-off in official statistics: publicly available data products can either provide detailed spatial information or detailed distributional information, but not both. We propose a model-based method to combine these two data products to produce estimates of detailed features of a given variable at a high degree of spatial resolution. Our motivating example uses the disseminated tabulations and PUMS from the American Community Survey to estimate U.S. Census tract-level income distributions and statistics associated with these distributions.
READ FULL TEXT