Better understanding of the multivariate hypergeometric distribution with implications in design-based survey sampling
Multivariate hypergeometric distribution arises frequently in elementary statistics and probability courses, for simultaneously studying the occurence law of specified events, when sampling without replacement from a finite population with fixed number of classification. Covariance matrix of this distribution is well known to be identical to its multinomial counterpart multiplied by 1-(n-1)/(N-1), with N and n being population and sample sizes, respectively. It appears to however, have been less discussed in the literature about the meaning of this relationship, especially regarding the specific form of the multiplier. Based on an augmenting argument together with probabilistic symmetry, we present a more transparent understanding for the covariance structure of the multivariate hypergeometric distribution. We discuss implications of these combined techniques and provide a unified description about the relative efficiency for estimating population mean based on simple random sampling, probability proportional-to-size sampling and adaptive cluster sampling, with versus without replacement. We also provide insight into the classic random group method for variance estimation.
READ FULL TEXT