On the Provable Advantage of Unsupervised Pretraining

03/02/2023
by   Jiawei Ge, et al.
0

Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited – most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models Φ and the downstream task is specified by a class of prediction functions Ψ. We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild ”informative” condition, our algorithm achieves an excess risk of 𝒪̃(√(𝒞_Φ/m) + √(𝒞_Ψ/n)) for downstream tasks, where 𝒞_Φ, 𝒞_Ψ are complexity measures of function classes Φ, Ψ, and m, n are the number of unlabeled and labeled data respectively. Comparing to the baseline of 𝒪̃(√(𝒞_Φ∘Ψ/n)) achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when m ≫ n and 𝒞_Φ∘Ψ > 𝒞_Ψ. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset