How do kernel-based sensor fusion algorithms behave under high dimensional noise?
We study the behavior of two kernel based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD), under the nonnull setting that the clean datasets collected from two sensors are modeled by a common low dimensional manifold embedded in a high dimensional Euclidean space and the datasets are corrupted by high dimensional noise. We establish the asymptotic limits and convergence rates for the eigenvalues of the associated kernel matrices assuming that the sample dimension and sample size are comparably large, where NCCA and AD are conducted using the Gaussian kernel. It turns out that both the asymptotic limits and convergence rates depend on the signal-to-noise ratio (SNR) of each sensor and selected bandwidths. On one hand, we show that if NCCA and AD are directly applied to the noisy point clouds without any sanity check, it may generate artificial information that misleads scientists' interpretation. On the other hand, we prove that if the bandwidths are selected adequately, both NCCA and AD can be made robust to high dimensional noise when the SNRs are relatively large.
READ FULL TEXT