MixTrain: Scalable Training of Formally Robust Neural Networks
There is an arms race to defend neural networks against adversarial examples. Notably, adversarially robust training and verifiably robust training are the most promising defenses. The adversarially robust training scales well but cannot provide provable robustness guarantee for the absence of attacks. We present an Interval Attack that reveals fundamental problems about the threat model used by adversarially robust training. On the contrary, verifiably robust training achieves sound guarantee, but it is computationally expensive and sacrifices accuracy, which prevents it being applied in practice. In this paper, we propose two novel techniques for verifiably robust training, stochastic output approximation and dynamic mixed training, to solve the aforementioned challenges. They are based on two critical insights: (1) soundness is only needed in a subset of training data; and (2) verifiable robustness and test accuracy are conflicting to achieve after a certain point of verifiably robust training. On both MNIST and CIFAR datasets, we are able to achieve similar test accuracy and estimated robust accuracy against PGD attacks within 14× less training time compared to state-of-the-art adversarially robust training techniques. In addition, we have up to 95.2 accuracy as a bonus. Also, to achieve similar verified robust accuracy, we are able to save up to 5× computation time and offer 9.2 improvement compared to current state-of-the-art verifiably robust training techniques.
READ FULL TEXT