Improving Optimization in Models With Continuous Symmetry Breaking

03/08/2018
by   Robert Bamler, et al.
0

Many loss functions in representation learning are invariant under a continuous symmetry transformation. As an example, consider word embeddings (Mikolov et al., 2013), where the loss remains unchanged if we simultaneously rotate all word and context embedding vectors. We show that representation learning models with a continuous symmetry and a quadratic Markovian time series prior possess so-called Goldstone modes. These are low cost deviations from the optimum which slow down convergence of gradient descent. We use tools from gauge theory in physics to design an optimization algorithm that solves the slow convergence problem. Our algorithm leads to a fast decay of Goldstone modes, to orders of magnitude faster convergence, and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models. We present an example application, translating modern words into historic language using a shared representation space.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset