Similarity-based Random Partition Distribution for Clustering Functional Data
Random partitioned distribution is a powerful tool for model-based clustering. However, the implementation in practice can be challenging for functional spatial data such as hourly observed population data observed in each region. The reason is that high dimensionality tends to yield excess clusters, and spatial dependencies are challenging to represent with a simple random partition distribution (e.g., the Dirichlet process). This paper addresses these issues by extending the generalized Dirichlet process to incorporate pairwise similarity information, which we call the similarity-based generalized Dirichlet process (SGDP), and provides theoretical justification for this approach. We apply SGDP to hourly population data observed in 500m meshes in Tokyo, and demonstrate its usefulness for functional clustering by taking account of spatial information.
READ FULL TEXT