Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models

by   Takami Sato, et al.

Denoising probabilistic diffusion models have shown breakthrough performance that can generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is indeed a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), by text prompts. The NDD attack can generate low-cost, model-agnostic, and transferrable adversarial attacks by exploiting the natural attack capability in diffusion models. Motivated by the finding, we construct a large-scale dataset, Natural Denoising Diffusion Attack (NDDA) dataset, to systematically evaluate the risk of the natural attack capability of diffusion models with state-of-the-art text-to-image diffusion models. We evaluate the natural attack capability by answering 6 research questions. Through a user study to confirm the validity of the NDD attack, we find that the NDD attack can achieve an 88 detection rate while being stealthy to 93 the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferrable attack capability, we perform the NDD attack against an AD vehicle and find that 73 of the physically printed attacks can be detected as a stop sign. We hope that our study and dataset can help our community to be aware of the risk of diffusion models and facilitate further research toward robust DNN models.


page 1

page 4

page 6

page 7


VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Diffusion Models (DMs) are state-of-the-art generative models that learn...

How to Backdoor Diffusion Models?

Diffusion models are state-of-the-art deep learning empowered generative...

DiffWA: Diffusion Models for Watermark Attack

With the rapid development of deep neural networks(DNNs), many robust bl...

BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models

The rise in popularity of text-to-image generative artificial intelligen...

Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model

To enhance the security of text CAPTCHAs, various methods have been empl...

Probabilistic Constellation Shaping With Denoising Diffusion Probabilistic Models: A Novel Approach

With the incredible results achieved from generative pre-trained transfo...

Please sign up or login with your details

Forgot password? Click here to reset