Generalized k-Means in GLMs with Applications to the Outbreak of COVID-19 in the United States
Generalized k-means can be incorporated with any similarity or dissimilarity measure for clustering. By choosing the dissimilarity measure as the well known likelihood ratio or F-statistic, this work proposes a method based on generalized k-means to group statistical models. Given the number of clusters k, the method is established under hypothesis tests between statistical models. If k is unknown, then the method can be combined with GIC to automatically select the best k for clustering. The article investigates both AIC and BIC as the special cases. Theoretical and simulation results show that the number of clusters can be identified by BIC but not AIC. The resulting method for GLMs is used to group the state-level time series patterns for the outbreak of COVID-19 in the United States. A further study shows that the statistical models between the clusters are significantly different from each other. This study confirms the result given by the proposed method based on generalized k-means.
READ FULL TEXT