How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

by   Yimeng Zhang, et al.

The lack of adversarial robustness has been recognized as an important issue for state-of-the-art machine learning (ML) models, e.g., deep neural networks (DNNs). Thereby, robustifying ML models against adversarial attacks is now a major focus of research. However, nearly all existing defense methods, particularly for robust training, made the white-box assumption that the defender has the access to the details of an ML model (or its surrogate alternatives if available), e.g., its architectures and parameters. Beyond existing works, in this paper we aim to address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? Such a problem arises in practical scenarios, where the owner of the predictive model is reluctant to share model information in order to preserve privacy. To this end, we propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS), a first-order (FO) certified defense technique. To allow the design of merely using model queries, we further integrate DS with the zeroth-order (gradient-free) optimization. However, a direct implementation of zeroth-order (ZO) optimization suffers a high variance of gradient estimates, and thus leads to ineffective defense. To tackle this problem, we next propose to prepend an autoencoder (AE) to a given (black-box) model so that DS can be trained using variance-reduced ZO optimization. We term the eventual defense as ZO-AE-DS. In practice, we empirically show that ZO-AE- DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines. And the effectiveness of our approach is justified under both image classification and image reconstruction tasks. Codes are available at


Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Certified defense methods against adversarial perturbations have been re...

Data-free Defense of Black Box Models Against Adversarial Attacks

Several companies often safeguard their trained deep models (i.e. detail...

Blacklight: Defending Black-Box Adversarial Attacks on Deep Neural Networks

The vulnerability of deep neural networks (DNNs) to adversarial examples...

Black-box Smoothing: A Provable Defense for Pretrained Classifiers

We present a method for provably defending any pretrained image classifi...

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

Deep neural networks are vulnerable to backdoor attacks, where an advers...

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

The cost efficiency of model inference is critical to real-world machine...

Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities

An important pillar for safe machine learning (ML) is the systematic mit...

Please sign up or login with your details

Forgot password? Click here to reset