Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

by   Micah Goldblum, et al.
University of Maryland

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.


page 1

page 2

page 3

page 4


Threats, Vulnerabilities, and Controls of Machine Learning Based Systems: A Survey and Taxonomy

In this article, we propose the Artificial Intelligence Security Taxonom...

Vulnerability Prioritization: An Offensive Security Approach

Organizations struggle to handle sheer number of vulnerabilities in thei...

A Survey on Resilient Machine Learning

Machine learning based system are increasingly being used for sensitive ...

Security for Machine Learning-based Systems: Attacks and Challenges during Training and Inference

The exponential increase in dependencies between the cyber and physical ...

Reframing Threat Detection: Inside esINSIDER

We describe the motivation and design for esINSIDER, an automated tool t...

"Why do so?" – A Practical Perspective on Machine Learning Security

Despite the large body of academic work on machine learning security, li...

Please sign up or login with your details

Forgot password? Click here to reset