Initializing Perturbations in Multiple Directions for Fast Adversarial Training
Recent developments in the filed of Deep Learning have demonstrated that Deep Neural Networks(DNNs) are vulnerable to adversarial examples. Specifically, in image classification, an adversarial example can fool the well trained deep neural networks by adding barely imperceptible perturbations to clean images. Adversarial Training, one of the most direct and effective methods, minimizes the losses of perturbed-data to learn robust deep networks against adversarial attacks. It has been proven that using the fast gradient sign method (FGSM) can achieve Fast Adversarial Training. However, FGSM-based adversarial training may finally obtain a failed model because of overfitting to FGSM samples. In this paper, we proposed the Diversified Initialized Perturbations Adversarial Training (DIP-FAT) which involves seeking the initialization of the perturbation via enlarging the output distances of the target model in a random directions. Due to the diversity of random directions, the embedded fast adversarial training using FGSM increases the information from the adversary and reduces the possibility of overfitting. In addition to preventing overfitting, the extensive results show that our proposed DIP-FAT technique can also improve the accuracy of the clean data. The biggest advantage of DIP-FAT method: achieving the best banlance among clean-data, perturbed-data and efficiency.
READ FULL TEXT