Cross-validation of matching correlation analysis by resampling matching weights

by   Hidetoshi Shimodaira, et al.

The strength of association between a pair of data vectors is represented by a nonnegative real number, called matching weight. For dimensionality reduction, we consider a linear transformation of data vectors, and define a matching error as the weighted sum of squared distances between transformed vectors with respect to the matching weights. Given data vectors and matching weights, the optimal linear transformation minimizing the matching error is solved by the spectral graph embedding of Yan et al. (2007). This method is a generalization of the canonical correlation analysis, and will be called as matching correlation analysis (MCA). In this paper, we consider a novel sampling scheme where the observed matching weights are randomly sampled from underlying true matching weights with small probability, whereas the data vectors are treated as constants. We then investigate a cross-validation by resampling the matching weights. Our asymptotic theory shows that the cross-validation, if rescaled properly, computes an unbiased estimate of the matching error with respect to the true matching weights. Existing ideas of cross-validation for resampling data vectors, instead of resampling matching weights, are not applicable here. MCA can be used for data vectors from multiple domains with different dimensions via an embarrassingly simple idea of coding the data vectors. This method will be called as cross-domain matching correlation analysis (CDMCA), and an interesting connection to the classical associative memory model of neural networks is also discussed.


page 1

page 2

page 3

page 4


A simple coding for cross-domain matching with dimension reduction via spectral graph embedding

Data vectors are obtained from multiple domains. They are feature vector...

A fast surrogate cross validation algorithm for meshfree RBF collocation approaches

Cross validation is an important tool in the RBF collocation setting, es...

Rescaling and other forms of unsupervised preprocessing introduce bias into cross-validation

Cross-validation of predictive models is the de-facto standard for model...

Design-based individual prediction

A design-based individual prediction approach is developed based on the ...

Sparse estimation in ordinary kriging for functional data

We introduce a sparse estimation in the ordinary kriging for functional ...

A Theory of Cross-Validation Error

This paper presents a theory of error in cross-validation testing of alg...

Return of the Infinitesimal Jackknife

The error or variability of machine learning algorithms is often assesse...

Please sign up or login with your details

Forgot password? Click here to reset