Local convergence rates of the least squares estimator with applications to transfer learning
Convergence properties of empirical risk minimizers can be conveniently expressed in terms of the associated population risk. To derive bounds for the performance of the estimator under covariate shift, however, pointwise convergence rates are required. Under weak assumptions on the design distribution, it is shown that the least squares estimator (LSE) over 1-Lipschitz functions is also minimax rate optimal with respect to a weighted uniform norm, where the weighting accounts in a natural way for the non-uniformity of the design distribution. This moreover implies that although least squares is a global criterion, the LSE turns out to be locally adaptive. We develop a new indirect proof technique that establishes the local convergence behavior based on a carefully chosen local perturbation of the LSE. These local rates are then used to construct a rate-optimal estimator for transfer learning under covariate shift.
READ FULL TEXT