Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model

by   Rongke Liu, et al.

Model inversion attacks (MIAs) are aimed at recovering private data from a target model's training set, which poses a threat to the privacy of deep learning models. MIAs primarily focus on the white-box scenario where the attacker has full access to the structure and parameters of the target model. However, practical applications are black-box, it is not easy for adversaries to obtain model-related parameters, and various models only output predicted labels. Existing black-box MIAs primarily focused on designing the optimization strategy, and the generative model is only migrated from the GAN used in white-box MIA. Our research is the pioneering study of feasible attack models in label-only black-box scenarios, to the best of our knowledge. In this paper, we develop a novel method of MIA using the conditional diffusion model to recover the precise sample of the target without any extra optimization, as long as the target model outputs the label. Two primary techniques are introduced to execute the attack. Firstly, select an auxiliary dataset that is relevant to the target model task, and the labels predicted by the target model are used as conditions to guide the training process. Secondly, target labels and random standard normally distributed noise are input into the trained conditional diffusion model, generating target samples with pre-defined guidance strength. We then filter out the most robust and representative samples. Furthermore, we propose for the first time to use Learned Perceptual Image Patch Similarity (LPIPS) as one of the evaluation metrics for MIA, with systematic quantitative and qualitative evaluation in terms of attack accuracy, realism, and similarity. Experimental results show that this method can generate similar and accurate data to the target without optimization and outperforms generators of previous approaches in the label-only scenario.


page 8

page 9


Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Transfer learning is an important approach that produces pre-trained tea...

Label-only Model Inversion Attack: The Attack that Requires the Least Information

In a model inversion attack, an adversary attempts to reconstruct the da...

Data-free Black-box Attack based on Diffusion Model

Since the training data for the target model in a data-free black-box at...

Label-Only Model Inversion Attacks via Boundary Repulsion

Recent studies show that the state-of-the-art deep neural networks are v...

Re-aligning Shadow Models can Improve White-box Membership Inference Attacks

Machine learning models have been shown to leak sensitive information ab...

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Back-door attack poses a severe threat to deep learning systems. It inje...

Reverse Stable Diffusion: What prompt was used to generate this image?

Text-to-image diffusion models such as Stable Diffusion have recently at...

Please sign up or login with your details

Forgot password? Click here to reset