Divide-and-Conquer Adversarial Detection
The vulnerabilities of deep neural networks against adversarial examples have become a major concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples have been shown to be only effective when the attacker is oblivious to the detection mechanism, i.e., in non-adaptive attacks. In this paper, we propose an effective and practical method for detecting adaptive/dynamic adversaries. In short, we train adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples. To identify a potential adversary, we first obtain the estimated class of the input using the classification system, and then use the corresponding detector to verify whether the input is a natural example of that class, or is an adversarially manipulated example. Experimental results on MNIST and CIFAR10 dataset show that our method could withstand adaptive PGD attacks. Furthermore, we demonstrate that with our novel training scheme our models learn significant more robust representation than ordinary adversarial training.
READ FULL TEXT 
  
  
     share
 share