Relative gradient optimization of the Jacobian term in unsupervised deep learning

06/26/2020
by   Luigi Gresele, et al.
13

Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals – thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their likelihood-based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact likelihood-based training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of the naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian without imposing constraints on its structure, in stark contrast to normalizing flows. An implementation of our method can be found at https://github.com/fissoreg/relative-gradient-jacobian

READ FULL TEXT
research
08/28/2021

Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models

Fast inference of numerical model parameters from data is an important p...
research
10/10/2020

AEGD: Adaptive Gradient Decent with Energy

In this paper, we propose AEGD, a new algorithm for first-order gradient...
research
06/01/2023

Balanced Training of Energy-Based Models with Adaptive Flow Sampling

Energy-based models (EBMs) are versatile density estimation models that ...
research
05/15/2023

Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method

Backpropagation (BP) is the most important gradient estimation method fo...
research
12/02/2022

Transformer-Based Learned Optimization

In this paper, we propose a new approach to learned optimization. As com...
research
02/07/2022

Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

We develop a new neural network based independent component analysis (IC...
research
06/27/2023

One-class systems seamlessly fit in the forward-forward algorithm

The forward-forward algorithm presents a new method of training neural n...

Please sign up or login with your details

Forgot password? Click here to reset