On gradient descent training under data augmentation with on-line noisy copies

by   Katsuyuki Hagiwara, et al.

In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization parameter corresponds to the variance of injected noise. On the other hand, we showed that the learning rate is multiplied by the number of noisy copies plus one in full-batch under the sum of squared errors and the mini-batch under the mean squared error; i.e., DA with on-line copies yields apparent acceleration of training. The apparent acceleration and regularization effect come from the original part and noise in a copy data respectively. These results are confirmed in a numerical experiment. In the numerical experiment, we found that our result can be approximately applied to usual off-line DA in under-parameterization scenario and can not in over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression is possible to be applied to neural networks.


page 1

page 2

page 3

page 4


Optimize TSK Fuzzy Systems for Big Data Regression Problems: Mini-Batch Gradient Descent with Regularization, DropRule and AdaBound (MBGD-RDA)

Takagi-Sugeno-Kang (TSK) fuzzy systems are very useful machine learning ...

Adaptive Learning Rate Clipping Stabilizes Learning

Artificial neural network training with stochastic gradient descent can ...

Variance Suppression: Balanced Training Process in Deep Learning

Stochastic gradient descent updates parameters with summation gradient c...

On the Inherent Regularization Effects of Noise Injection During Training

Randomly perturbing networks during the training process is a commonly u...

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective

Data augmentation (DA) is a powerful workhorse for bolstering performanc...

Disparity Between Batches as a Signal for Early Stopping

We propose a metric for evaluating the generalization ability of deep ne...

A New Learning Approach for Noise Reduction

Noise is a part of data whether the data is from measurement, experiment...

Please sign up or login with your details

Forgot password? Click here to reset