Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge

09/20/2023
by   Manuel Brack, et al.
0

Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content. As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks. Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.

READ FULL TEXT

page 1

page 2

page 3

research
05/28/2023

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?

Text-conditioned image generation models have recently achieved astonish...
research
05/22/2023

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

The generative AI revolution in recent years has been spurred by an expa...
research
11/09/2022

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models

Text-conditioned image generation models have recently achieved astonish...
research
02/07/2023

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness

Generative AI models have recently achieved astonishing results in quali...
research
04/04/2023

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

Token-based masked generative models are gaining popularity for their fa...
research
06/09/2023

Safety and Fairness for Content Moderation in Generative Models

With significant advances in generative AI, new technologies are rapidly...
research
03/09/2023

Identification of Systematic Errors of Image Classifiers on Rare Subgroups

Despite excellent average-case performance of many image classifiers, th...

Please sign up or login with your details

Forgot password? Click here to reset