Jitter: Random Jittering Loss Function

by   Zhicheng Cai, et al.

Regularization plays a vital role in machine learning optimization. One novel regularization method called flooding makes the training loss fluctuate around the flooding level. It intends to make the model continue to random walk until it comes to a flat loss landscape to enhance generalization. However, the hyper-parameter flooding level of the flooding method fails to be selected properly and uniformly. We propose a novel method called Jitter to improve it. Jitter is essentially a kind of random loss function. Before training, we randomly sample the Jitter Point from a specific probability distribution. The flooding level should be replaced by Jitter point to obtain a new target function and train the model accordingly. As Jitter point acting as a random factor, we actually add some randomness to the loss function, which is consistent with the fact that there exists innumerable random behaviors in the learning process of the machine learning model and is supposed to make the model more robust. In addition, Jitter performs random walk randomly which divides the loss curve into small intervals and then flipping them over, ideally making the loss curve much flatter and enhancing generalization ability. Moreover, Jitter can be a domain-, task-, and model-independent regularization method and train the model effectively after the training error reduces to zero. Our experimental results show that Jitter method can improve model performance more significantly than the previous flooding method and make the test loss curve descend twice.


page 1

page 2

page 3

page 4


Do We Need Zero Training Loss After Achieving Zero Training Error?

Overparameterized deep networks have the capacity to memorize training d...

Error Loss Networks

A novel model called error loss network (ELN) is proposed to build an er...

Multi-level Distance Regularization for Deep Metric Learning

We propose a novel distance-based regularization method for deep metric ...

A Random Walk Approach to First-Order Stochastic Convex Optimization

Online minimization of an unknown convex function over a convex and comp...

MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles

The strong correlation between neurons or filters can significantly weak...

SynsetRank: Degree-adjusted Random Walk for Relation Identification

In relation extraction, a key process is to obtain good detectors that f...

Demystifying ResNet

The Residual Network (ResNet), proposed in He et al. (2015), utilized sh...

Please sign up or login with your details

Forgot password? Click here to reset