Lipschitz standardization for robust multivariate learning

by   Adrián Javaloy, et al.

Current trends in machine learning rely on out-of-the-box gradient-based approaches. With the aim of mitigating numerical errors and to improve the convergence of the learning process, a common empirical practice is to standardize or normalize the data. However, there is a lack of theoretical analysis regarding why and when these methods result in an improvement of the learning process. In this work, we first study these methods in the context of black-box variational inference, specifically analyzing the effect that scaling the data has on the smoothness of the optimization landscape. Our analysis shows that no general rule applies in order to decide which of the existing data scaling methods, or even if they, will improve the learning process. Second, we highlight the issues that arise when dealing with multivariate data, due to the discrepancy in smoothness of the likelihood functions for different variables, and the inability to scale discrete data. Finally, we propose a novel Lipschitz standardization, and its extension for discrete data, which overcomes the aforementioned limitations. Specifically, as backed by our experiments, Lipschitz standardization i) favors a fairer learning across different variables in the data; and ii) results in faster and more accurate learning.


page 1

page 2

page 3

page 4


Provable Smoothness Guarantees for Black-Box Variational Inference

Black-box variational inference tries to approximate a complex target di...

Lipschitz Optimisation for Lipschitz Interpolation

Techniques known as Nonlinear Set Membership prediction, Kinky Inference...

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Convergence and convergence rate analyses of adaptive methods, such as A...

Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks

In this work, we study the black-box targeted attack problem from the mo...

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition

We provide a theoretical explanation for the fast convergence of gradien...

Boosting Black Box Variational Inference

Approximating a probability density in a tractable manner is a central t...

Please sign up or login with your details

Forgot password? Click here to reset