Fast Convergence for Langevin Diffusion with Matrix Manifold Structure

02/13/2020
by   Ankur Moitra, et al.
0

In this paper, we study the problem of sampling from distributions of the form p(x) ∝ e^-β f(x) for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex – for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X ∈ R^d × k of a low-rank matrix, i.e. f(X) = A(XX^T) - b^2_2, X ∈ R^d × k, and β the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2015

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Despite having various attractive qualities such as high prediction accu...
research
11/27/2020

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

We provide an explicit analysis of the dynamics of vanilla gradient desc...
research
08/25/2017

Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications

Recently, convex formulations of low-rank matrix factorization problems ...
research
09/04/2023

Asymmetric matrix sensing by gradient descent with small random initialization

We study matrix sensing, which is the problem of reconstructing a low-ra...
research
12/29/2016

Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization

We propose a general theory for studying the geometry of nonconvex objec...
research
10/11/2020

Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev

Sampling is a fundamental and arguably very important task with numerous...
research
02/17/2022

Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods

We consider Ising models on the hypercube with a general interaction mat...

Please sign up or login with your details

Forgot password? Click here to reset