Data Isotopes for Data Provenance in DNNs

by   Emily Wenger, et al.

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data is appropriated for model training. To empower users to counteract unwanted data use, we design, implement and evaluate a practical system that enables users to detect if their data was used to train an DNN model. We show how users can create special data points we call isotopes, which introduce "spurious features" into DNNs during training. With only query access to a trained model and no knowledge of the model training process, or control of the data labels, a user can apply statistical hypothesis testing to detect if a model has learned the spurious features associated with their isotopes by training on the user's data. This effectively turns DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects instead of digital marks, and remains generally robust against several adaptive countermeasures.


page 1

page 2

page 4

page 5

page 6

page 9

page 12

page 16


Use of Metamorphic Relations as Knowledge Carriers to Train Deep Neural Networks

Training multiple-layered deep neural networks (DNNs) is difficult. The ...

The Optimal ANN Model for Predicting Bearing Capacity of Shallow Foundations Trained on Scarce Data

This study is focused on determining the potential of using deep neural ...

Analysis of Generalizability of Deep Neural Networks Based on the Complexity of Decision Boundary

For supervised learning models, the analysis of generalization ability (...

Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects

Despite excellent performance on stationary test sets, deep neural netwo...

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

The emergence of the Internet of Things (IoT) has resulted in a remarkab...

A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization

Conventional DNN training paradigms typically rely on one training set a...

PaRoT: A Practical Framework for Robust Deep NeuralNetwork Training

Deep Neural Networks (DNNs) are finding important applications in safety...

Please sign up or login with your details

Forgot password? Click here to reset