Segmentation of high dimensional means over multi-dimensional change points and connections to regression trees
This article is motivated by the objective of providing a new analytically tractable and fully frequentist framework to characterize and implement regression trees while also allowing a multivariate (potentially high dimensional) response. The connection to regression trees is made by a high dimensional model with dynamic mean vectors over multi-dimensional change axes. Our theoretical analysis is carried out under a single two dimensional change point setting. An optimal rate of convergence of the proposed estimator is obtained, which in turn allows existence of limiting distributions. Distributional behavior of change point estimates are split into two distinct regimes, the limiting distributions under each regime is then characterized, in turn allowing construction of asymptotically valid confidence intervals for 2d-location of change. All results are obtained under a high dimensional scaling slog^2 p=o(T_wT_h), where p is the response dimension, s is a sparsity parameter, and T_w,T_h are sampling periods along change axes. We characterize full regression trees by defining a multiple multi-dimensional change point model. Natural extensions of the single 2d-change point estimation methodology are provided. Two applications, first on segmentation of Infra-red astronomy satellite (IRAS) data and second to segmentation of digital images are provided. Methodology and theoretical results are supported with monte-carlo simulations.
READ FULL TEXT