Study on upper limit of sample sizes for a two-level test in NIST SP800-22
NIST SP800-22 is one of the widely used statistical testing tools for pseudorandom number generators (PRNGs). This tool consists of 15 tests (one-level tests) and two additional tests (two-level tests). Each of one-level tests provides one or more p-values. The two-level tests measure the uniformity of the obtained p-values for a fixed one-level test. One of the two-level tests is to categorize the p-values into ten intervals of equal length, and apply a chi-squared goodness-of-fit test.This two-level test is often more powerful than one-level tests, but sometimes it rejects even good PRNGs when the sample size at the second level is too large, since it detects approximation errors in the computation of p-values. In this paper, we propose a practical upper limit of the sample size in this two-level test, for each of six tests appeared in SP800-22. These upper limits are derived by the chi-squared discrepancy between the distribution of the approximated p-values and the uniform distribution U(0, 1). We also computed a "risky" sample size at the second level for each one-level test. Experiments show that the two-level test with the proposed upper limit gives appropriate results, while using the risky size often rejects even good PRNGs. We also propose another improvement: to use the exact probability for the ten categories in the computation of goodness-of-fit at the two-level test. This allows us to increase the sample size at the second level, and would make the test more sensitive than the NIST's recommending usage.
READ FULL TEXT