Learnable Boundary Guided Adversarial Training
Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, our target is to reduce natural accuracy degradation. We use the model logits from one clean model ℳ^natural to guide learning of the robust model ℳ^robust, taking into consideration that logits from the well trained clean model ℳ^natural embed the most discriminative features of natural data, e.g., generalizable classifier boundary. Our solution is to constrain logits from the robust model ℳ^robust that takes adversarial examples as input and make it similar to those from a clean model ℳ^natural fed with corresponding natural data. It lets ℳ^robust inherit the classifier boundary of ℳ^natural. Thus, we name our method Boundary Guided Adversarial Training (BGAT). Moreover, we generalize BGAT to Learnable Boundary Guided Adversarial Training (LBGAT) by training ℳ^natural and ℳ^robust simultaneously and collaboratively to learn one most robustness-friendly classifier boundary for the strongest robustness. Extensive experiments are conducted on CIFAR-10, CIFAR-100, and challenging Tiny ImageNet datasets. Along with other state-of-the-art adversarial training approaches, e.g., Adversarial Logit Pairing (ALP) and TRADES, the performance is further enhanced.
READ FULL TEXT 
  
  
     share
 share