Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family
In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates wu2016convergence. In this paper, we establish consistency and convergence rate of CD with annealed learning rate η_t. Specifically, suppose CD-m generates the sequence of parameters {θ_t}_t > 0 using an i.i.d. data sample X_1^n ∼ p_θ^* of size n, then δ_n(X_1^n) = _t →∞∑_s=t_0^t η_s θ_s / ∑_s=t_0^t η_s - θ^* converges in probability to 0 at a rate of 1/√(n). The number (m) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in wu2016convergence. which depends critically on the fact that {θ_t}_t > 0 is a homogeneous Markov chain conditional on the observed sample X_1^n. Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible 2× 2 Boltzmann Machine are provided to demonstrate our theoretical results.
READ FULL TEXT